lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241009113500.kgd75g72wlknb46n@skbuf>
Date: Wed, 9 Oct 2024 14:35:00 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: Wei Fang <wei.fang@....com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
	pabeni@...hat.com, claudiu.manoil@....com, ast@...nel.org,
	daniel@...earbox.net, hawk@...nel.org, john.fastabend@...il.com,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
	bpf@...r.kernel.org, stable@...r.kernel.org, imx@...ts.linux.dev,
	rkannoth@...vell.com, maciej.fijalkowski@...el.com,
	sbhatta@...vell.com
Subject: Re: [PATCH v3 net 2/3] net: enetc: fix the issues of XDP_REDIRECT
 feature

Commit title still mentions only XDP_REDIRECT, whereas implementation
also touches XDP_TX (and only makes a very minor mention of it).

Wouldn't it be better to have "net: enetc: block concurrent XDP
transmissions during ring reconfiguration" for a commit title?

On Wed, Oct 09, 2024 at 05:03:26PM +0800, Wei Fang wrote:
> When testing the XDP_REDIRECT function on the LS1028A platform, we
> found a very reproducible issue that the Tx frames can no longer be
> sent out even if XDP_REDIRECT is turned off. Specifically, if there
> is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on,
> the console may display some warnings like "timeout for tx ring #6
> clear", and all redirected frames will be dropped, the detaild log

detailed

> is as follows.
> 
> root@...028ardb:~# ./xdp-bench redirect eno0 eno2
> Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc)
> [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear
> [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
> [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
> eno0->eno2     1420505 rx/s       1420590 err,drop/s      0 xmit/s
>   xmit eno0->eno2    0 xmit/s     1420590 drop/s     0 drv_err/s     15.71 bulk-avg
> eno0->eno2     1420484 rx/s       1420485 err,drop/s      0 xmit/s
>   xmit eno0->eno2    0 xmit/s     1420485 drop/s     0 drv_err/s     15.71 bulk-avg
> 
> By analyzing the XDP_REDIRECT implementation of enetc driver, we
> found two problems. First, enetc driver will reconfigure Tx and
> Rx BD rings when a bpf program is installed or uninstalled, but
> there is no mechanisms to block the redirected frames when enetc
> driver reconfigures BD rings. So introduce ENETC_TX_DOWN flag to

(.. driver reconfigures BD rings.) Similarly, XDP_TX verdicts on
received frames can also lead to frames being enqueued in the TX rings.
Because XDP ignores the state set by the netif_tx_wake_queue() API, we
also have to introduce the ENETC_TX_DOWN flag to suppress transmission
of XDP frames.

> prevent the redirected frames to be attached to Tx BD rings. This
> is not only used to block XDP_REDIRECT frames, but also to block
> XDP_TX frames.
> 
> Second, Tx BD rings are disabled first in enetc_stop() and then
> wait for empty. This operation is not safe while the Tx BD ring

the driver waits for them to become empty.

> is actively transmitting frames, and will cause the ring to not
> be empty and hardware exception. As described in the block guide
> of LS1028A NETC, software should only disable an active ring after
> all pending ring entries have been consumed (i.e. when PI = CI).
> Disabling a transmit ring that is actively processing BDs risks
> a HW-SW race hazard whereby a hardware resource becomes assigned
> to work on one or more ring entries only to have those entries be
> removed due to the ring becoming disabled. So the correct behavior
> is that the software stops putting frames on the Tx BD rings (this
> is what ENETC_TX_DOWN does), then waits for the Tx BD rings to be
> empty, and finally disables the Tx BD rings.

It feels like this separate part (refactoring of enetc_start() and
enetc_stop() operation ordering) should be its own patch? It is
logically different than the introduction and checking of the
ENETC_TX_DOWN condition.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ