netdev - [PATCH v2 0/1] net: macb: Avoid erroneously stopped TX ring.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20230421130035.14346-1-ingo.rohloff@lauterbach.com>
Date:   Fri, 21 Apr 2023 15:00:34 +0200
From:   Ingo Rohloff <ingo.rohloff@...terbach.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Roman Gushchin <roman.gushchin@...ux.dev>,
        Lars-Peter Clausen <lars@...afoo.de>,
        robert.hancock@...ian.com, Nicolas.Ferre@...rochip.com,
        claudiu.beznea@...rochip.com, davem@...emloft.net,
        netdev@...r.kernel.org, tomas.melin@...sala.com,
        Ingo Rohloff <ingo.rohloff@...terbach.com>
Subject: [PATCH v2 0/1] net: macb: Avoid erroneously stopped TX ring.

On Tue, 11 Apr 2023 19:07:15 -0700
Jakub Kicinski <kuba@...nel.org> wrote:

> On Fri,  7 Apr 2023 23:33:48 +0200 Ingo Rohloff wrote:
> > Analysis:
> > Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX and RX BD
> > prefetch") mentions:
> >
> >     GEM version in ZynqMP and most versions greater than r1p07 supports
> >     TX and RX BD prefetch. The number of BDs that can be prefetched is a
> >     HW configurable parameter. For ZynqMP, this parameter is 4.
> >
> > I think what happens is this:
> > Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
> > 1) SW has written TX descriptors 0..7
> > 2) HW is currently transmitting TX descriptor 6.
> >    HW has already prefetched TX descriptors 6,7,8,9.
> > 3) SW writes TX descriptor 8 (clearing TX_USED)
> > 4) SW writes the TSTART bit.
> >    HW ignores this, because it is still transmitting.
> > 5) HW transmits TX descriptor 7.
> > 6) HW reaches descriptor 8; because this descriptor
> >    has already been prefetched, HW sees a non-active
> >    descriptor (TX_USED set) and stops transmitting.
>
> This sounds broken, any idea if this is how the IP is supposed to work
> or it may be an integration issue in Zynq?  The other side of this
> question is how expensive the workaround is - a spin lock and two extra
> register reads on completion seems like a lot.
>
> Roman, Lars, have you seen Tx stalls on your macb setups?

I think I finally was able to hunt down the exact conditions.
It seems to be simpler than I thought.
This also results in a completely modified patch and work-around.

As far as I can tell there is a race condition between HW and SW.
(TX descriptor 4 is just used as an example here):

1) HW has just read TX descriptor 4 (with set TX_USED bit).
   "Just read" means: The read data of the read transfer is
   in flight to the HW module.
   SW clears TX_USED bit of TX descriptor 4.

2) SW writes TSTART bit in NCR register.
   HW is still processing TX descriptor 4 (TGO bit in TSR still set)
   and thus ignores the TSTART trigger.

3) HW stops TX processing (TGO bit in TSR is cleared), because it
   sees the set TX_USED bit of the (previous) TX descriptor 4.

We now have got an active pending TX descriptor 4, but HW will not
restart transmission, because it ignored the corresponding
TSTART trigger.

I encountered the stuck TX descriptor state, when before:
1) TBQP register indicated HW reached the TX descriptor,
   where the TX_USED bit is just cleared by SW.
2) HW had a set TGO bit in the TSR register,
   which indicates HW is still processing the TX ring.

The patch I propose tries to ensure that the TSTART trigger
is not ignored.
The idea is if we have this condition

* The TGO bit is set in TSR
* The TBQP register points to the TX descriptor,
  where the TX_USED bit was just cleared.

then I assume we are in an unknown state:

* HW could either see the cleared TX_USED bit and continue, or
* HW could ignore the TSTART trigger and stop.

The patch tries to detect this condition in the new function
macb_fix_tstart_race().
If this unknown state is detected, the HW is rechecked until either
* The TBQP register points to a different descriptor (meaning
  the hardware is still actively processing the TX ring)
* The hardware indicates it has stopped processing via a
  cleared TGO bit in the TSR register.
  In this case the TSTART bit is written again.

Note: Cadence might be able to clear up under which circumstances
this race condition might happen.
On the Xilinx ZynqMP Ultrascale+ device I have here,
I for sure reach the line

    /* Controller stopped... write TSTART again.

in the patch; which indicates the code of the patch at least covers
a condition which happens in reality on this particular SoC.

regards
  Ingo

Ingo Rohloff (1):
  net: macb: Avoid erroneously stopped TX ring.

 drivers/net/ethernet/cadence/macb.h      |  1 -
 drivers/net/ethernet/cadence/macb_main.c | 94 +++++++++++++-----------
 2 files changed, 50 insertions(+), 45 deletions(-)

--
2.17.1