[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241009113500.kgd75g72wlknb46n@skbuf>
Date: Wed, 9 Oct 2024 14:35:00 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: Wei Fang <wei.fang@....com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, claudiu.manoil@....com, ast@...nel.org,
daniel@...earbox.net, hawk@...nel.org, john.fastabend@...il.com,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
bpf@...r.kernel.org, stable@...r.kernel.org, imx@...ts.linux.dev,
rkannoth@...vell.com, maciej.fijalkowski@...el.com,
sbhatta@...vell.com
Subject: Re: [PATCH v3 net 2/3] net: enetc: fix the issues of XDP_REDIRECT
feature
Commit title still mentions only XDP_REDIRECT, whereas implementation
also touches XDP_TX (and only makes a very minor mention of it).
Wouldn't it be better to have "net: enetc: block concurrent XDP
transmissions during ring reconfiguration" for a commit title?
On Wed, Oct 09, 2024 at 05:03:26PM +0800, Wei Fang wrote:
> When testing the XDP_REDIRECT function on the LS1028A platform, we
> found a very reproducible issue that the Tx frames can no longer be
> sent out even if XDP_REDIRECT is turned off. Specifically, if there
> is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on,
> the console may display some warnings like "timeout for tx ring #6
> clear", and all redirected frames will be dropped, the detaild log
detailed
> is as follows.
>
> root@...028ardb:~# ./xdp-bench redirect eno0 eno2
> Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc)
> [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear
> [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
> [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
> eno0->eno2 1420505 rx/s 1420590 err,drop/s 0 xmit/s
> xmit eno0->eno2 0 xmit/s 1420590 drop/s 0 drv_err/s 15.71 bulk-avg
> eno0->eno2 1420484 rx/s 1420485 err,drop/s 0 xmit/s
> xmit eno0->eno2 0 xmit/s 1420485 drop/s 0 drv_err/s 15.71 bulk-avg
>
> By analyzing the XDP_REDIRECT implementation of enetc driver, we
> found two problems. First, enetc driver will reconfigure Tx and
> Rx BD rings when a bpf program is installed or uninstalled, but
> there is no mechanisms to block the redirected frames when enetc
> driver reconfigures BD rings. So introduce ENETC_TX_DOWN flag to
(.. driver reconfigures BD rings.) Similarly, XDP_TX verdicts on
received frames can also lead to frames being enqueued in the TX rings.
Because XDP ignores the state set by the netif_tx_wake_queue() API, we
also have to introduce the ENETC_TX_DOWN flag to suppress transmission
of XDP frames.
> prevent the redirected frames to be attached to Tx BD rings. This
> is not only used to block XDP_REDIRECT frames, but also to block
> XDP_TX frames.
>
> Second, Tx BD rings are disabled first in enetc_stop() and then
> wait for empty. This operation is not safe while the Tx BD ring
the driver waits for them to become empty.
> is actively transmitting frames, and will cause the ring to not
> be empty and hardware exception. As described in the block guide
> of LS1028A NETC, software should only disable an active ring after
> all pending ring entries have been consumed (i.e. when PI = CI).
> Disabling a transmit ring that is actively processing BDs risks
> a HW-SW race hazard whereby a hardware resource becomes assigned
> to work on one or more ring entries only to have those entries be
> removed due to the ring becoming disabled. So the correct behavior
> is that the software stops putting frames on the Tx BD rings (this
> is what ENETC_TX_DOWN does), then waits for the Tx BD rings to be
> empty, and finally disables the Tx BD rings.
It feels like this separate part (refactoring of enetc_start() and
enetc_stop() operation ordering) should be its own patch? It is
logically different than the introduction and checking of the
ENETC_TX_DOWN condition.
Powered by blists - more mailing lists