netdev - Re: [PATCH] net: bcmgenet: Reset RBUF on first open

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <33ba4e9cde1ccd1c9f561873782478a913eab670.camel@redhat.com>
Date: Tue, 27 Feb 2024 13:53:03 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Doug Berger <opendmb@...il.com>, Florian Fainelli
	 <florian.fainelli@...adcom.com>, Maarten Vanraes <maarten@...il.be>
Cc: netdev@...r.kernel.org, Broadcom internal kernel review list
 <bcm-kernel-feedback-list@...adcom.com>, Phil Elwell <phil@...pberrypi.com>
Subject: Re: [PATCH] net: bcmgenet: Reset RBUF on first open

On Mon, 2024-02-26 at 15:13 -0800, Doug Berger wrote:
> On 2/26/2024 9:34 AM, Florian Fainelli wrote:
> > On 2/23/24 15:53, Maarten Vanraes wrote:
> > > From: Phil Elwell <phil@...pberrypi.com>
> > > 
> > > If the RBUF logic is not reset when the kernel starts then there
> > > may be some data left over from any network boot loader. If the
> > > 64-byte packet headers are enabled then this can be fatal.
> > > 
> > > Extend bcmgenet_dma_disable to do perform the reset, but not when
> > > called from bcmgenet_resume in order to preserve a wake packet.
> > > 
> > > N.B. This different handling of resume is just based on a hunch -
> > > why else wouldn't one reset the RBUF as well as the TBUF? If this
> > > isn't the case then it's easy to change the patch to make the RBUF
> > > reset unconditional.
> > 
> > The real question is why is not the boot loader putting the GENET core 
> > into a quasi power-on-reset state, since this is what Linux expects, and 
> > also it seems the most conservative and prudent approach. Assuming the 
> > RDMA and Unimac RX are disabled, otherwise we would happily continuing 
> > to accept packets in DRAM, then the question is why is not the RBUF 
> > flushed too, or is it flushed, but this is insufficient, if so, have we 
> > determined why?
> > 
> > > 
> > > See: https://github.com/raspberrypi/linux/issues/3850
> > > 
> > > Signed-off-by: Phil Elwell <phil@...pberrypi.com>
> > > Signed-off-by: Maarten Vanraes <maarten@...il.be>
> > > ---
> > >   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 ++++++++++++----
> > >   1 file changed, 12 insertions(+), 4 deletions(-)
> > > 
> > > This patch fixes a problem on RPI 4B where in ~2/3 cases (if you're using
> > > nfsroot), you fail to boot; or at least the boot takes longer than
> > > 30 minutes.
> > 
> > This makes me wonder whether this also fixes the issues that Maxime 
> > reported a long time ago, which I can reproduce too, but have not been 
> > able to track down the source of:
> > 
> > https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/
> > 
> > > 
> > > Doing a simple ping revealed that when the ping starts working again
> > > (during the boot process), you have ping timings of ~1000ms, 2000ms or
> > > even 3000ms; while in normal cases it would be around 0.2ms.
> > 
> > I would prefer that we find a way to better qualify whether a RBUF reset 
> > is needed or not, but I suppose there is not any other way, since there 
> > is an "RBUF enabled" bit that we can key off.
> > 
> > Doug, what do you think?
> I agree that the Linux driver expects the GENET core to be in a "quasi 
> power-on-reset state" and it seems likely that in both Maxime's case and 
> the one identified here that is not the case. It would appear that the 
> Raspberry Pi bootloader and/or "firmware" are likely not disabling the 
> GENET receiver after loading the kernel image and before invoking the 
> kernel. They may be disabling the DMA, but that is insufficient since 
> any received data would likely overflow the RBUF leaving it in a "bad" 
> state which this patch apparently improves.
> 
> So it seems likely these issues are caused by improper 
> bootloader/firmware behavior.
> 
> That said, I suppose it would be nice if the driver were more robust. 
> However, we both know how finicky the receive path of the GENET core can 
> be about its initialization. Therefore, I am unwilling to "bless" this 
> change for upstream without more due diligence on our side.

Could you please report back in a reasonable timeframe? The issue
addressed here looks like relevant, and the patch quite self-
encapsulated.

We can keep the path in PW meanwhile.

Thanks,

Paolo