lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLxKQOY5ZA5o3d1y=v4MEAsAQnzmVDjmLY0_bJPG93tKQ@mail.gmail.com>
Date: Mon, 16 Oct 2023 21:37:58 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Miquel Raynal <miquel.raynal@...tlin.com>
Cc: "Russell King (Oracle)" <linux@...linux.org.uk>, Wei Fang <wei.fang@....com>, 
	Shenwei Wang <shenwei.wang@....com>, Clark Wang <xiaoning.wang@....com>, davem@...emloft.net, 
	kuba@...nel.org, pabeni@...hat.com, linux-imx@....com, netdev@...r.kernel.org, 
	Thomas Petazzoni <thomas.petazzoni@...tlin.com>, 
	Alexandre Belloni <alexandre.belloni@...tlin.com>, 
	Maxime Chevallier <maxime.chevallier@...tlin.com>, Andrew Lunn <andrew@...n.ch>, 
	Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: Ethernet issue on imx6

On Mon, Oct 16, 2023 at 5:37 PM Miquel Raynal <miquel.raynal@...tlin.com> wrote:
>
> Hello again,
>
> > > > # iperf3 -c 192.168.1.1
> > > > Connecting to host 192.168.1.1, port 5201
> > > > [  5] local 192.168.1.2 port 37948 connected to 192.168.1.1 port 5201
> > > > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > > > [  5]   0.00-1.00   sec  11.3 MBytes  94.5 Mbits/sec   43   32.5 KBytes
> > > > [  5]   1.00-2.00   sec  3.29 MBytes  27.6 Mbits/sec   26   1.41 KBytes
> > > > [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
> > > > [  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    5   1.41 KBytes
> > > > [  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
> > > > [  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
> > > > [  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
> > > > [  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > > [  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
> > > >
> > > > Thanks,
> > > > Miquèl
> > >
> > > Can you experiment with :
> > >
> > > - Disabling TSO on your NIC (ethtool -K eth0 tso off)
> > > - Reducing max GSO size (ip link set dev eth0 gso_max_size 16384)
> > >
> > > I suspect some kind of issues with fec TX completion, vs TSO emulation.
> >
> > Wow, appears to have a significant effect. I am using Busybox's iproute
> > implementation which does not know gso_max_size, but I hacked directly
> > into netdevice.h just to see if it would have an effect. I'm adding
> > iproute2 to the image for further testing.
> >
> > Here is the diff:
> >
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -2364,7 +2364,7 @@ struct net_device {
> >  /* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
> >   * and shinfo->gso_segs is a 16bit field.
> >   */
> > -#define GSO_MAX_SIZE           (8 * GSO_MAX_SEGS)
> > +#define GSO_MAX_SIZE           16384u
> >
> >         unsigned int            gso_max_size;
> >  #define TSO_LEGACY_MAX_SIZE    65536
> >
> > And here are the results:
> >
> > # ethtool -K eth0 tso off
> > # iperf3 -c 192.168.1.1 -u -b1M
> > Connecting to host 192.168.1.1, port 5201
> > [  5] local 192.168.1.2 port 50490 connected to 192.168.1.1 port 5201
> > [ ID] Interval           Transfer     Bitrate         Total Datagrams
> > [  5]   0.00-1.00   sec   123 KBytes  1.01 Mbits/sec  87
> > [  5]   1.00-2.00   sec   122 KBytes   996 Kbits/sec  86
> > [  5]   2.00-3.00   sec   122 KBytes   996 Kbits/sec  86
> > [  5]   3.00-4.00   sec   123 KBytes  1.01 Mbits/sec  87
> > [  5]   4.00-5.00   sec   122 KBytes   996 Kbits/sec  86
> > [  5]   5.00-6.00   sec   122 KBytes   996 Kbits/sec  86
> > [  5]   6.00-7.00   sec   123 KBytes  1.01 Mbits/sec  87
> > [  5]   7.00-8.00   sec   122 KBytes   996 Kbits/sec  86
> > [  5]   8.00-9.00   sec   122 KBytes   996 Kbits/sec  86
> > [  5]   9.00-10.00  sec   123 KBytes  1.01 Mbits/sec  87
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
> > [  5]   0.00-10.00  sec  1.19 MBytes  1.00 Mbits/sec  0.000 ms  0/864 (0%)  sender
> > [  5]   0.00-10.05  sec  1.11 MBytes   925 Kbits/sec  0.045 ms  62/864 (7.2%)  receiver
> > iperf Done.
> > # iperf3 -c 192.168.1.1
> > Connecting to host 192.168.1.1, port 5201
> > [  5] local 192.168.1.2 port 34792 connected to 192.168.1.1 port 5201
> > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > [  5]   0.00-1.00   sec  1.63 MBytes  13.7 Mbits/sec   30   1.41 KBytes
> > [  5]   1.00-2.00   sec  7.40 MBytes  62.1 Mbits/sec   65   14.1 KBytes
> > [  5]   2.00-3.00   sec  7.83 MBytes  65.7 Mbits/sec  109   2.83 KBytes
> > [  5]   3.00-4.00   sec  2.49 MBytes  20.9 Mbits/sec   46   19.8 KBytes
> > [  5]   4.00-5.00   sec  7.89 MBytes  66.2 Mbits/sec  109   2.83 KBytes
> > [  5]   5.00-6.00   sec   255 KBytes  2.09 Mbits/sec   22   2.83 KBytes
> > [  5]   6.00-7.00   sec  4.35 MBytes  36.5 Mbits/sec   74   41.0 KBytes
> > [  5]   7.00-8.00   sec  10.9 MBytes  91.8 Mbits/sec   34   45.2 KBytes
> > [  5]   8.00-9.00   sec  5.35 MBytes  44.9 Mbits/sec   82   1.41 KBytes
> > [  5]   9.00-10.00  sec  1.37 MBytes  11.5 Mbits/sec   73   1.41 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  49.5 MBytes  41.5 Mbits/sec  644             sender
> > [  5]   0.00-10.05  sec  49.3 MBytes  41.1 Mbits/sec                  receiver
> > iperf Done.
> >
> > There is still a noticeable amount of drop/retries, but overall the
> > results are significantly better. What is the rationale behind the
> > choice of 16384 in particular? Could this be further improved?
>
> Apparently I've been too enthusiastic. After sending this e-mail I've
> re-generated an image with iproute2 and dd'ed the whole image into an
> SD card, while until now I was just updating the kernel/DT manually and
> got the same performances as above without the gro size trick. I need
> to clarify this further.
>

Looking a bit at fec, I think fec_enet_txq_put_hdr_tso() is  bogus...

txq->tso_hdrs should be properly aligned by definition.

If FEC_QUIRK_SWAP_FRAME is requested, better copy the right thing, not
original skb->data ???

diff --git a/drivers/net/ethernet/freescale/fec_main.c
b/drivers/net/ethernet/freescale/fec_main.c
index 77c8e9cfb44562e73bfa89d06c5d4b179d755502..520436d579d66cc3263527373d754a206cb5bcd6
100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -753,7 +753,6 @@ fec_enet_txq_put_hdr_tso(struct fec_enet_priv_tx_q *txq,
        struct fec_enet_private *fep = netdev_priv(ndev);
        int hdr_len = skb_tcp_all_headers(skb);
        struct bufdesc_ex *ebdp = container_of(bdp, struct bufdesc_ex, desc);
-       void *bufaddr;
        unsigned long dmabuf;
        unsigned short status;
        unsigned int estatus = 0;
@@ -762,11 +761,11 @@ fec_enet_txq_put_hdr_tso(struct fec_enet_priv_tx_q *txq,
        status &= ~BD_ENET_TX_STATS;
        status |= (BD_ENET_TX_TC | BD_ENET_TX_READY);

-       bufaddr = txq->tso_hdrs + index * TSO_HEADER_SIZE;
        dmabuf = txq->tso_hdrs_dma + index * TSO_HEADER_SIZE;
-       if (((unsigned long)bufaddr) & fep->tx_align ||
-               fep->quirks & FEC_QUIRK_SWAP_FRAME) {
-               memcpy(txq->tx_bounce[index], skb->data, hdr_len);
+       if (fep->quirks & FEC_QUIRK_SWAP_FRAME) {
+               void *bufaddr = txq->tso_hdrs + index * TSO_HEADER_SIZE;
+
+               memcpy(txq->tx_bounce[index], bufaddr, hdr_len);
                bufaddr = txq->tx_bounce[index];

                if (fep->quirks & FEC_QUIRK_SWAP_FRAME)

Powered by blists - more mailing lists