netdev - Re: [PATCH 0/1] Alternative, restart tx after tx used bit read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0ceb7ed6-2535-aa40-52a4-c6a8e99dafdc@microchip.com>
Date:   Mon, 24 Apr 2023 08:54:02 +0000
From:   <Claudiu.Beznea@...rochip.com>
To:     <ingo.rohloff@...terbach.com>, <robert.hancock@...ian.com>
CC:     <Nicolas.Ferre@...rochip.com>, <davem@...emloft.net>,
        <kuba@...nel.org>, <netdev@...r.kernel.org>,
        <tomas.melin@...sala.com>
Subject: Re: [PATCH 0/1] Alternative, restart tx after tx used bit read

On 08.04.2023 00:33, Ingo Rohloff wrote:
> [You don't often get email from ingo.rohloff@...terbach.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> I am sorry; this is a long E-Mail.
> 
> I am referring to this problem:
> 
> Robert Hancock wrote:
>> On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote:
>>> On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
>>>>> From: Claudiu Beznea <claudiu.beznea@...rochip.com>
>>>>>
>>>>> On some platforms (currently detected only on SAMA5D4) TX might stuck
>>>>> even the pachets are still present in DMA memories and TX start was
>>>>> issued for them.
>>>>> ...
>>>> On Xilinx Zynq the above change can cause infinite interrupt loop
>>>> leading to CPU stall.  Seems timing/load needs to be appropriate for
>>>> this to happen, and currently with 1G ethernet this can be triggered
>>>> normally within minutes when running stress tests on the network
>>>> interface.
>>>> ...
>>> Which kernel version are you using?  Robert has been working on macb +
>>> Zynq recently, adding him to CC.
>> ...
>> I haven't looked at the TX ring descriptor and register setup on this core
>> in that much detail, but the fact the controller gets into this "TX used
>> bit read" state in the first place seems unusual.  I'm wondering if
>> something is being done in the wrong order or if we are missing a memory
>> barrier etc?
> 
> I am developing on a ZynqMP (Ultrascale+) SoC from AMD/Xilinx.
> I have seen the same issue before commit 4298388574dae6168 ("net: macb:
> restart tx after tx used bit read")
> 
> The scenario which sometimes triggers it for me:
> 
> I have an application running on the PC.
> The application sends a short command (via TCP) to the ZynqMP.
> The ZynqMP answers with a long stream of bytes via TCP
> (around 230KiB).
> The PC knows the amount of data and waits to receive the data completely.
> The PC gets stuck, because the last TCP segment of the transfer gets
> stuck in the ZynqMP and is not transmitted.
> You can re-trigger the TX Ring by pinging the ZynqMP:
> The Ping answer will re-trigger the TX ring, which in turn will also
> then send the stuck IP/TCP packet.

From the moment I've been working on this I remember ping was also
restarting the TX for me (which is expected as it enqueues more descriptors
and issue TSTART). At the time I worked on this the hardware prefetch was
not implemented in software (and I think the hardware I had the issue with
didn't support it).

> 
> Unfortunately triggering this problem seems to be hard; at least I am
> not able to reproduce it easily.

It was the same on my side.

> 
> So: If anyone has a more reliable way to trigger the problem,
> please tell me.
> This is to check if my proposed alternative works under all circumstances.
> 
> I have an alternate implementation, which does not require to turn on
> the "TX USED BIT READ" (TUBR) interrupt.
> The reason why I think this alternative might be better is, because I
> believe the TUBR interrupt happens at the wrong time; so I am not sure
> that the current implementation works reliably.
> 
> Analysis:
> Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX and RX BD
> prefetch") mentions:
> 
>     GEM version in ZynqMP and most versions greater than r1p07 supports
>     TX and RX BD prefetch. The number of BDs that can be prefetched is a
>     HW configurable parameter. For ZynqMP, this parameter is 4.
> 
> I think what happens is this:
> Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
> 1) SW has written TX descriptors 0..7
> 2) HW is currently transmitting TX descriptor 6.
>    HW has already prefetched TX descriptors 6,7,8,9.
> 3) SW writes TX descriptor 8 (clearing TX_USED)
> 4) SW writes the TSTART bit.
>    HW ignores this, because it is still transmitting.
> 5) HW transmits TX descriptor 7.
> 6) HW reaches descriptor 8; because this descriptor
>    has already been prefetched, HW sees a non-active
>    descriptor (TX_USED set) and stops transmitting.

I think you can check TBQP register to see if hardware position in TX queue
is ahead of software to validate against hardware this prefetch scenario.

> 
>>From debugging the code it seems that the TUBR interrupt happens, when
> a descriptor is prefetched, which has a TX_USED bit set, which is before

From what I know, yes, TUBR is raised when HW TX logic reaches a descriptor
with TX_USED set.

> it is processed by the rest of the hardware:
> When looking at the end of a transfer it seems I get a TUBR interrupt,
> followed by some more TX COMPLETE interrupts.
> 
> Additionally that means at the time the TUBR interrupt happens, it
> is too early to write the TSTART bit again, because the hardware is
> still actively transmitting.
> 
> The alternative I implemented is to check in macb_tx_complete() if
> 
> 1) The TX Queue is non-empty (there are pending TX descriptors)
> 2) The hardware indicates that it is not transmitting any more
> 
> If this situation is detected, the TSTART bit will be written to
> restart the TX ring.
> 
> I know for sure, that I hit the code path, which restarts the
> transmission in macb_tx_complete(); that's why I believe the
> "Example Scenario" I described above is correct.
> 
> I am still not sure if what I implemented is enough:
> macb_tx_complete() should at least see all completed TX descriptors.
> I still believe there is a (very short) time window in which there
> might be a race:
> 1) HW completes TX descriptor 7 and sets the TX_USED bit
>    in TX descriptor 7.
>    TX descriptor 8 was prefetched with a set TX_USED bit.
> 2) SW sees that TX descriptor 7 is completed
>    (TX_USED bit now is set).
> 3) SW sees that there still is a pending TX descriptor 8.
> 4) SW checks if the TGO bit is still set, which it is.
>    So the SW does nothing at this point.
> 5) HW processes the prefetched,set TX_USED bit in
>    TX descriptor 8 and stops transmission (clearing the TGO bit).
> 
> I am not sure if it is guaranteed that 5) cannot happen after 4).  If 5)
> happens after 4) as described above, then the controller still gets stuck.
> The only idea I can come up with, is to re-check the TGO bit
> a second time a little bit later, but I am not sure how to
> implement this.
> 
> Is there anyone who has access to hardware documentation, which
> sheds some light onto the way the descriptor prefetching works?

Some part of it could be found here:
https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/ProductDocuments/DataSheets/SAMA7G5-Series-Data-Sheet-DS60001765A.pdf#_OPENTOPIC_TOC_PROCESSING_d17023e778582

Thank you,
Claudiu

> 
> so long
>   Ingo
> 
> 
> Ingo Rohloff (1):
>   net: macb: A different way to restart a stuck TX descriptor ring.
> 
>  drivers/net/ethernet/cadence/macb.h      |  1 -
>  drivers/net/ethernet/cadence/macb_main.c | 67 +++++++++---------------
>  2 files changed, 24 insertions(+), 44 deletions(-)
> 
> --
> 2.17.1
>