netdev - RE: [PATCH v2 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings are disabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <PAXPR04MB85101B7AC1C1F46E8DD59958887E2@PAXPR04MB8510.eurprd04.prod.outlook.com>
Date: Tue, 8 Oct 2024 03:30:49 +0000
From: Wei Fang <wei.fang@....com>
To: Vladimir Oltean <vladimir.oltean@....com>
CC: "davem@...emloft.net" <davem@...emloft.net>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "kuba@...nel.org" <kuba@...nel.org>,
	"pabeni@...hat.com" <pabeni@...hat.com>, Claudiu Manoil
	<claudiu.manoil@....com>, "ast@...nel.org" <ast@...nel.org>,
	"daniel@...earbox.net" <daniel@...earbox.net>, "hawk@...nel.org"
	<hawk@...nel.org>, "john.fastabend@...il.com" <john.fastabend@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "bpf@...r.kernel.org"
	<bpf@...r.kernel.org>, "stable@...r.kernel.org" <stable@...r.kernel.org>,
	"imx@...ts.linux.dev" <imx@...ts.linux.dev>, "rkannoth@...vell.com"
	<rkannoth@...vell.com>, "maciej.fijalkowski@...el.com"
	<maciej.fijalkowski@...el.com>, "sbhatta@...vell.com" <sbhatta@...vell.com>
Subject: RE: [PATCH v2 net 3/3] net: enetc: disable IRQ after Rx and Tx BD
 rings are disabled



Best Regards,
Wei Fang
Hi Vladimir,

> >
> > On Sun, Sep 29, 2024 at 10:45:06AM +0800, Wei Fang wrote:
> > > When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
> > > on LS1028A, it was found that if the command was re-run multiple times,
> > > Rx could not receive the frames, and the result of xdo-bench showed
> > > that the rx rate was 0.
> > >
> > > root@...028ardb:~# ./xdp-bench tx eno0
> > > Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
> > > Summary                      2046 rx/s                  0
> > err,drop/s
> > > Summary                         0 rx/s                  0
> > err,drop/s
> > > Summary                         0 rx/s                  0
> > err,drop/s
> > > Summary                         0 rx/s                  0
> > err,drop/s
> > >
> > > By observing the Rx PIR and CIR registers, we found that CIR is always
> > > equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring
> > > is full and can no longer accommodate other Rx frames. Therefore, we
> > > can conclude that the problem is caused by the Rx BD ring not being
> > > cleaned up.
> > >
> > > Further analysis of the code revealed that the Rx BD ring will only
> > > be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
> > > Therefore, some debug logs were added to the driver and the current
> > > values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
> > > BD ring was full. The logs are as follows.
> > >
> > > [  178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
> > > [  178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
> > > [  178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
> > >
> > > From the results, we can see that the max value of xdp_tx_in_flight
> > > has reached 2140. However, the size of the Rx BD ring is only 2048.
> > > This is incredible, so we checked the code again and found that
> > > xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled
> > > and it was not reset when the bfp program was installed again. The
> > > root cause is that the IRQ is disabled too early in enetc_stop(),
> > > resulting in enetc_recycle_xdp_tx_buff() not being called, therefore,
> > > xdp_tx_in_flight is not cleared.
> > >
> > > Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet
> > processing")
> > > Cc: stable@...r.kernel.org
> > > Signed-off-by: Wei Fang <wei.fang@....com>
> > > ---
> > > v2 changes:
> > > 1. Modify the titile and rephrase the commit meesage.
> > > 2. Use the new solution as described in the title
> > > ---
> >
> > I gave this another test under a bit different set of circumstances this time,
> > and I'm confident that there are still problems, which I haven't identified
> > though (yet).
> >
> > With 64 byte frames at 2.5 Gbps, I see this going on:
> >
> > $ xdp-bench tx eno0 &
> > $ while :; do taskset $((1 << 0)) hwstamp_ctl -i eno0 -r 1 && sleep 1 &&
> taskset
> > $((1 << 0)) hwstamp_ctl -i eno0 -r 0 && sleep 1; done
> > current settings:
> > tx_type 0
> > rx_filter 0
> > new settings:
> > tx_type 0
> > rx_filter 1
> > Summary                 1,556,952 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> > current settings:
> > tx_type 0
> > rx_filter 1
> > Summary                         0 rx/s                  0
> err,drop/s
> > [  883.780346] fsl_enetc 0000:00:00.0 eno0: timeout for tx ring #6 clear (its
> > RX ring has 2072 XDP_TX frames in flight)
> > new settings:
> > tx_type 0
> > rx_filter 0
> > Summary                     1,027 rx/s                  0
> err,drop/s
> > current settings:
> > tx_type 0
> > rx_filter 0
> > Summary                         0 rx/s                  0
> err,drop/s
> >
> > which looks like the symptoms that the patch tries to solve.
> >
> > My previous testing was with 390 byte frames, and this did not happen.
> >
> > Please do not merge this.
> 
> Oh, it looks like there are still some issues we don't know about. I did
> test using 64 bytes but not at that high of a rate. Also I didn't turn on
> timestamp. Anyway, I will try to reproduce the issue when I'm back to
> office next Tuesday. It would be nice if you can help find the root cause
> before next Tuesday, thanks!

I think the reason is that Rx BDRs are disabled when enetc_stop() is called,
but there are still many unprocessed frames on Rx BDR. These frames will
be processed by XDP program and put into Tx BDR. So enetc_wait_txbdr()
will timeout and cause xdp_tx_in_flight will not be cleared.

So based on this patch, we should add a separate patch, similar to the patch
2 ("net: enetc: fix the issues of XDP_REDIRECT feature "), which prevents the
XDP_TX frames from being put into Tx BDRs when the ENETC_TX_DOWN flag
is set. The new patch is shown below. After adding this new patch, I followed
your test steps and tested for more than 30 minutes, and the issue cannot be
reproduced anymore (without this patch, this problem would be reproduced
within seconds).

--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1606,6 +1606,12 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
                        break;
                case XDP_TX:
                        tx_ring = priv->xdp_tx_ring[rx_ring->index];
+                       if (unlikely(test_bit(ENETC_TX_DOWN, &priv->flags))) {
+                               enetc_xdp_drop(rx_ring, orig_i, i);
+                               tx_ring->stats.xdp_tx_drops++;
+                               break;
+                       }
+
                        xdp_tx_bd_cnt = enetc_rx_swbd_to_xdp_tx_swbd(xdp_tx_arr,
                                                                     rx_ring,
                                                                     orig_i, i);