lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <565e8bbe-ebca-4c41-add3-bb7cd41c3c17@broadcom.com>
Date: Tue, 19 Mar 2024 14:50:11 -0700
From: Florian Fainelli <florian.fainelli@...adcom.com>
To: Maarten <maarten@...il.be>,
 Florian Fainelli <florian.fainelli@...adcom.com>, netdev@...r.kernel.org,
 Broadcom internal kernel review list <bcm-kernel-feedback-list@...adcom.com>
Cc: Doug Berger <opendmb@...il.com>, Phil Elwell <phil@...pberrypi.com>
Subject: Re: [PATCH] net: bcmgenet: Reset RBUF on first open

On 3/19/24 14:11, Maarten wrote:
> Florian Fainelli schreef op 2024-03-19 17:56:
>> On 3/16/24 04:53, Maarten wrote:
>>> Doug Berger schreef op 2024-02-27 00:13:
>>>> On 2/26/2024 9:34 AM, Florian Fainelli wrote:
>>>>> On 2/23/24 15:53, Maarten Vanraes wrote:
>>>>>> From: Phil Elwell <phil@...pberrypi.com>
>>>>>>
>>>>>> If the RBUF logic is not reset when the kernel starts then there
>>>>>> may be some data left over from any network boot loader. If the
>>>>>> 64-byte packet headers are enabled then this can be fatal.
>>>>>>
>>>>>> Extend bcmgenet_dma_disable to do perform the reset, but not when
>>>>>> called from bcmgenet_resume in order to preserve a wake packet.
>>>>>>
>>>>>> N.B. This different handling of resume is just based on a hunch -
>>>>>> why else wouldn't one reset the RBUF as well as the TBUF? If this
>>>>>> isn't the case then it's easy to change the patch to make the RBUF
>>>>>> reset unconditional.
>>>>>
>>>>> The real question is why is not the boot loader putting the GENET 
>>>>> core into a quasi power-on-reset state, since this is what Linux 
>>>>> expects, and also it seems the most conservative and prudent 
>>>>> approach. Assuming the RDMA and Unimac RX are disabled, otherwise 
>>>>> we would happily continuing to accept packets in DRAM, then the 
>>>>> question is why is not the RBUF flushed too, or is it flushed, but 
>>>>> this is insufficient, if so, have we determined why?
>>>>>
>>>>>>
>>>>>> See: https://github.com/raspberrypi/linux/issues/3850
>>>>>>
>>>>>> Signed-off-by: Phil Elwell <phil@...pberrypi.com>
>>>>>> Signed-off-by: Maarten Vanraes <maarten@...il.be>
>>>>>> ---
>>>>>>   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 
>>>>>> ++++++++++++----
>>>>>>   1 file changed, 12 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> This patch fixes a problem on RPI 4B where in ~2/3 cases (if 
>>>>>> you're using
>>>>>> nfsroot), you fail to boot; or at least the boot takes longer than
>>>>>> 30 minutes.
>>>>>
>>>>> This makes me wonder whether this also fixes the issues that Maxime 
>>>>> reported a long time ago, which I can reproduce too, but have not 
>>>>> been able to track down the source of:
>>>>>
>>>>> https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/
>>>>>
>>>>>>
>>>>>> Doing a simple ping revealed that when the ping starts working again
>>>>>> (during the boot process), you have ping timings of ~1000ms, 
>>>>>> 2000ms or
>>>>>> even 3000ms; while in normal cases it would be around 0.2ms.
>>>>>
>>>>> I would prefer that we find a way to better qualify whether a RBUF 
>>>>> reset is needed or not, but I suppose there is not any other way, 
>>>>> since there is an "RBUF enabled" bit that we can key off.
>>>>>
>>>>> Doug, what do you think?
>>>> I agree that the Linux driver expects the GENET core to be in a "quasi
>>>> power-on-reset state" and it seems likely that in both Maxime's case
>>>> and the one identified here that is not the case. It would appear that
>>>> the Raspberry Pi bootloader and/or "firmware" are likely not disabling
>>>> the GENET receiver after loading the kernel image and before invoking
>>>> the kernel. They may be disabling the DMA, but that is insufficient
>>>> since any received data would likely overflow the RBUF leaving it in a
>>>> "bad" state which this patch apparently improves.
>>>>
>>>> So it seems likely these issues are caused by improper
>>>> bootloader/firmware behavior.
>>>>
>>>> That said, I suppose it would be nice if the driver were more robust.
>>>> However, we both know how finicky the receive path of the GENET core
>>>> can be about its initialization. Therefore, I am unwilling to "bless"
>>>> this change for upstream without more due diligence on our side.
>>>
>>> Hey, did you guys have any chance to check this stuff out? any 
>>> thoughts on it?
>>
>> We are both busy with higher priority work and I cannot see us being
>> able to dedicate any time to this issue until April.
>>
>> While we are sympathetic to your issue and you having upstreamed a fix
>> for it, it is entirely self inflicted by having the VPU boot loader
>> firmware not properly quiesce the GENET controller, at least based
>> upon the description, therefore the natural fix should be... in the
>> firmware.
> 
> I totally agree that the natural fix should be in the firmware.
> 
>> From my perspective: NAK.
> 
> Fair enough, though I do think that there are often workarounds for 
> faulty firmware, and making the driver more robust is not a bad thing 
> either.

That is fair enough, the kernel does indeed have a number of workarounds 
for bad firmware and hardware obviously. I am seriously concerned that 
doing this RBUF reset might be beneficial here on 2711, but it is not 
clear to me that this is not going to break the 15+ other SoCs that use 
BCMGENET (BCM7xxx). The 2711 use case is a specific chip with a 
completely different clocking and busing compared to the other chips 
where this block has been deployed, and there is also a very narrow use 
with RGMII only, whereas we have MII, reverse MII, GMII and RGMII being 
actively used.

> 
> In any case, I try to raise this issue with the raspberry pi people in 
> the hopes of fixing this in the proper place.
> 

Yes, let's try that route, and then maybe come April we have some spare 
cycles for running some tests.
-- 
Florian


Download attachment "smime.p7s" of type "application/pkcs7-signature" (4221 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ