lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b88b4c5cf0ba7de2a639633ffbd5ceb@rmail.be>
Date: Tue, 19 Mar 2024 22:11:46 +0100
From: Maarten <maarten@...il.be>
To: Florian Fainelli <florian.fainelli@...adcom.com>,
 netdev@...r.kernel.org, Broadcom internal kernel review list
 <bcm-kernel-feedback-list@...adcom.com>
Cc: Doug Berger <opendmb@...il.com>, Phil Elwell <phil@...pberrypi.com>
Subject: Re: [PATCH] net: bcmgenet: Reset RBUF on first open

Florian Fainelli schreef op 2024-03-19 17:56:
> On 3/16/24 04:53, Maarten wrote:
>> Doug Berger schreef op 2024-02-27 00:13:
>>> On 2/26/2024 9:34 AM, Florian Fainelli wrote:
>>>> On 2/23/24 15:53, Maarten Vanraes wrote:
>>>>> From: Phil Elwell <phil@...pberrypi.com>
>>>>> 
>>>>> If the RBUF logic is not reset when the kernel starts then there
>>>>> may be some data left over from any network boot loader. If the
>>>>> 64-byte packet headers are enabled then this can be fatal.
>>>>> 
>>>>> Extend bcmgenet_dma_disable to do perform the reset, but not when
>>>>> called from bcmgenet_resume in order to preserve a wake packet.
>>>>> 
>>>>> N.B. This different handling of resume is just based on a hunch -
>>>>> why else wouldn't one reset the RBUF as well as the TBUF? If this
>>>>> isn't the case then it's easy to change the patch to make the RBUF
>>>>> reset unconditional.
>>>> 
>>>> The real question is why is not the boot loader putting the GENET 
>>>> core into a quasi power-on-reset state, since this is what Linux 
>>>> expects, and also it seems the most conservative and prudent 
>>>> approach. Assuming the RDMA and Unimac RX are disabled, otherwise we 
>>>> would happily continuing to accept packets in DRAM, then the 
>>>> question is why is not the RBUF flushed too, or is it flushed, but 
>>>> this is insufficient, if so, have we determined why?
>>>> 
>>>>> 
>>>>> See: https://github.com/raspberrypi/linux/issues/3850
>>>>> 
>>>>> Signed-off-by: Phil Elwell <phil@...pberrypi.com>
>>>>> Signed-off-by: Maarten Vanraes <maarten@...il.be>
>>>>> ---
>>>>>   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 
>>>>> ++++++++++++----
>>>>>   1 file changed, 12 insertions(+), 4 deletions(-)
>>>>> 
>>>>> This patch fixes a problem on RPI 4B where in ~2/3 cases (if you're 
>>>>> using
>>>>> nfsroot), you fail to boot; or at least the boot takes longer than
>>>>> 30 minutes.
>>>> 
>>>> This makes me wonder whether this also fixes the issues that Maxime 
>>>> reported a long time ago, which I can reproduce too, but have not 
>>>> been able to track down the source of:
>>>> 
>>>> https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/
>>>> 
>>>>> 
>>>>> Doing a simple ping revealed that when the ping starts working 
>>>>> again
>>>>> (during the boot process), you have ping timings of ~1000ms, 2000ms 
>>>>> or
>>>>> even 3000ms; while in normal cases it would be around 0.2ms.
>>>> 
>>>> I would prefer that we find a way to better qualify whether a RBUF 
>>>> reset is needed or not, but I suppose there is not any other way, 
>>>> since there is an "RBUF enabled" bit that we can key off.
>>>> 
>>>> Doug, what do you think?
>>> I agree that the Linux driver expects the GENET core to be in a 
>>> "quasi
>>> power-on-reset state" and it seems likely that in both Maxime's case
>>> and the one identified here that is not the case. It would appear 
>>> that
>>> the Raspberry Pi bootloader and/or "firmware" are likely not 
>>> disabling
>>> the GENET receiver after loading the kernel image and before invoking
>>> the kernel. They may be disabling the DMA, but that is insufficient
>>> since any received data would likely overflow the RBUF leaving it in 
>>> a
>>> "bad" state which this patch apparently improves.
>>> 
>>> So it seems likely these issues are caused by improper
>>> bootloader/firmware behavior.
>>> 
>>> That said, I suppose it would be nice if the driver were more robust.
>>> However, we both know how finicky the receive path of the GENET core
>>> can be about its initialization. Therefore, I am unwilling to "bless"
>>> this change for upstream without more due diligence on our side.
>> 
>> Hey, did you guys have any chance to check this stuff out? any 
>> thoughts on it?
> 
> We are both busy with higher priority work and I cannot see us being
> able to dedicate any time to this issue until April.
> 
> While we are sympathetic to your issue and you having upstreamed a fix
> for it, it is entirely self inflicted by having the VPU boot loader
> firmware not properly quiesce the GENET controller, at least based
> upon the description, therefore the natural fix should be... in the
> firmware.

I totally agree that the natural fix should be in the firmware.

> From my perspective: NAK.

Fair enough, though I do think that there are often workarounds for 
faulty firmware, and making the driver more robust is not a bad thing 
either.

In any case, I try to raise this issue with the raspberry pi people in 
the hopes of fixing this in the proper place.

Thanks for the response,

Maarten

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ