lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <PAXPR04MB8510D29BA035DDC3902374C3887F2@PAXPR04MB8510.eurprd04.prod.outlook.com>
Date: Wed, 9 Oct 2024 12:56:38 +0000
From: Wei Fang <wei.fang@....com>
To: Vladimir Oltean <vladimir.oltean@....com>
CC: "davem@...emloft.net" <davem@...emloft.net>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "kuba@...nel.org" <kuba@...nel.org>,
	"pabeni@...hat.com" <pabeni@...hat.com>, Claudiu Manoil
	<claudiu.manoil@....com>, "ast@...nel.org" <ast@...nel.org>,
	"daniel@...earbox.net" <daniel@...earbox.net>, "hawk@...nel.org"
	<hawk@...nel.org>, "john.fastabend@...il.com" <john.fastabend@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "bpf@...r.kernel.org"
	<bpf@...r.kernel.org>, "stable@...r.kernel.org" <stable@...r.kernel.org>,
	"imx@...ts.linux.dev" <imx@...ts.linux.dev>, "rkannoth@...vell.com"
	<rkannoth@...vell.com>, "maciej.fijalkowski@...el.com"
	<maciej.fijalkowski@...el.com>, "sbhatta@...vell.com" <sbhatta@...vell.com>
Subject: RE: [PATCH v3 net 3/3] net: enetc: disable IRQ after Rx and Tx BD
 rings are disabled


> -----Original Message-----
> From: Vladimir Oltean <vladimir.oltean@....com>
> Sent: 2024年10月9日 20:10
> To: Wei Fang <wei.fang@....com>
> Cc: davem@...emloft.net; edumazet@...gle.com; kuba@...nel.org;
> pabeni@...hat.com; Claudiu Manoil <claudiu.manoil@....com>;
> ast@...nel.org; daniel@...earbox.net; hawk@...nel.org;
> john.fastabend@...il.com; linux-kernel@...r.kernel.org;
> netdev@...r.kernel.org; bpf@...r.kernel.org; stable@...r.kernel.org;
> imx@...ts.linux.dev; rkannoth@...vell.com; maciej.fijalkowski@...el.com;
> sbhatta@...vell.com
> Subject: Re: [PATCH v3 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings
> are disabled
> 
> On Wed, Oct 09, 2024 at 05:03:27PM +0800, Wei Fang wrote:
> > When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
> > on LS1028A, it was found that if the command was re-run multiple times,
> > Rx could not receive the frames, and the result of xdo-bench showed
> 
> xdp-bench
> 
> > that the rx rate was 0.
> >
> > root@...028ardb:~# ./xdp-bench tx eno0
> > Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
> > Summary                      2046 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> >
> > By observing the Rx PIR and CIR registers, we found that CIR is always
> > equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring
> > is full and can no longer accommodate other Rx frames. Therefore, we
> > can conclude that the problem is caused by the Rx BD ring not being
> > cleaned up.
> >
> > Further analysis of the code revealed that the Rx BD ring will only
> > be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
> > Therefore, some debug logs were added to the driver and the current
> > values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
> > BD ring was full. The logs are as follows.
> >
> > [  178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
> > [  178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
> > [  178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
> >
> > From the results, we can see that the max value of xdp_tx_in_flight
> > has reached 2140. However, the size of the Rx BD ring is only 2048.
> > This is incredible, so we checked the code again and found that
> > xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled
> > and it was not reset when the bfp program was installed again.
> 
> Please make it clear that this is more general and it happens whenever
> enetc_stop() is called.
> 
> > The root cause is that the IRQ is disabled too early in enetc_stop(),
> > resulting in enetc_recycle_xdp_tx_buff() not being called, therefore,
> > xdp_tx_in_flight is not cleared.
> 
> I feel that the problem is not so much the IRQ, as the NAPI (softirq),
> really. Under heavy traffic we don't even get that many hardirqs (if any),
> but NAPI just reschedules itself because of the budget which constantly
> gets exceeded. Please make this also clear in the commit title,
> something like "net: enetc: disable NAPI only after TX rings are empty".
> 
> I would restate the problem as: "The root cause is that we disable NAPI
> too aggressively, without having waited for the pending XDP_TX frames to
> be transmitted, and their buffers recycled, so that the xdp_tx_in_flight
> counter can naturally drop to zero. Later, enetc_free_tx_ring() does
> free those stale, untransmitted XDP_TX packets, but it is not coded up
> to also reset the xdp_tx_in_flight counter, hence the manifestation of
> the bug."
> 
> And then we should have a paragraph that describes the solution as well.
> "One option would be to cover this extra condition in enetc_free_tx_ring(),
> but now that the ENETC_TX_DOWN exists, we have created a window at the
> beginning of enetc_stop() where NAPI can still be scheduled, but any
> concurrent enqueue will be blocked. Therefore, we can call enetc_wait_bdrs()
> and enetc_disable_tx_bdrs() with NAPI still scheduled, and it is
> guaranteed that this will not wait indefinitely, but instead give us an
> indication that the pending TX frames have orderly dropped to zero.
> Only then should we call napi_disable().
> 
> This way, enetc_free_tx_ring() becomes entirely redundant and can be
> dropped as part of subsequent cleanup.
> 
> The change also refactors enetc_start() so that it looks like the mirror
> opposite procedure of enetc_stop()."
> 
> I think describing the problem and solution in these terms gives the
> reviewers more versed in NAPI a better chance of understanding what is
> going on and what we are trying to achieve.

Thanks for helping rephrase the commit message, I will applying it to
the next version.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ