[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231017131901.5ae65e4d@xps-13>
Date: Tue, 17 Oct 2023 13:19:01 +0200
From: Miquel Raynal <miquel.raynal@...tlin.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "Russell King (Oracle)" <linux@...linux.org.uk>, Wei Fang
<wei.fang@....com>, Shenwei Wang <shenwei.wang@....com>, Clark Wang
<xiaoning.wang@....com>, davem@...emloft.net, kuba@...nel.org,
pabeni@...hat.com, linux-imx@....com, netdev@...r.kernel.org, Thomas
Petazzoni <thomas.petazzoni@...tlin.com>, Alexandre Belloni
<alexandre.belloni@...tlin.com>, Maxime Chevallier
<maxime.chevallier@...tlin.com>, Andrew Lunn <andrew@...n.ch>, Stephen
Hemminger <stephen@...workplumber.org>, Alexander Stein
<alexander.stein@...tq-group.com>
Subject: Re: Ethernet issue on imx6
Hi Eric,
edumazet@...gle.com wrote on Mon, 16 Oct 2023 21:37:58 +0200:
> On Mon, Oct 16, 2023 at 5:37 PM Miquel Raynal <miquel.raynal@...tlin.com> wrote:
> >
> > Hello again,
> >
> > > > > # iperf3 -c 192.168.1.1
> > > > > Connecting to host 192.168.1.1, port 5201
> > > > > [ 5] local 192.168.1.2 port 37948 connected to 192.168.1.1 port 5201
> > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > [ 5] 0.00-1.00 sec 11.3 MBytes 94.5 Mbits/sec 43 32.5 KBytes
> > > > > [ 5] 1.00-2.00 sec 3.29 MBytes 27.6 Mbits/sec 26 1.41 KBytes
> > > > > [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
> > > > > [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 5 1.41 KBytes
> > > > > [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
> > > > > [ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
> > > > >
> > > > > Thanks,
> > > > > Miquèl
> > > >
> > > > Can you experiment with :
> > > >
> > > > - Disabling TSO on your NIC (ethtool -K eth0 tso off)
> > > > - Reducing max GSO size (ip link set dev eth0 gso_max_size 16384)
> > > >
> > > > I suspect some kind of issues with fec TX completion, vs TSO emulation.
> > >
> > > Wow, appears to have a significant effect. I am using Busybox's iproute
> > > implementation which does not know gso_max_size, but I hacked directly
> > > into netdevice.h just to see if it would have an effect. I'm adding
> > > iproute2 to the image for further testing.
> > >
> > > Here is the diff:
> > >
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -2364,7 +2364,7 @@ struct net_device {
> > > /* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
> > > * and shinfo->gso_segs is a 16bit field.
> > > */
> > > -#define GSO_MAX_SIZE (8 * GSO_MAX_SEGS)
> > > +#define GSO_MAX_SIZE 16384u
> > >
> > > unsigned int gso_max_size;
> > > #define TSO_LEGACY_MAX_SIZE 65536
> > >
> > > And here are the results:
> > >
> > > # ethtool -K eth0 tso off
> > > # iperf3 -c 192.168.1.1 -u -b1M
> > > Connecting to host 192.168.1.1, port 5201
> > > [ 5] local 192.168.1.2 port 50490 connected to 192.168.1.1 port 5201
> > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > [ 5] 0.00-1.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > [ 5] 1.00-2.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 2.00-3.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 3.00-4.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > [ 5] 4.00-5.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 5.00-6.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 6.00-7.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > [ 5] 7.00-8.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 8.00-9.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 9.00-10.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> > > [ 5] 0.00-10.00 sec 1.19 MBytes 1.00 Mbits/sec 0.000 ms 0/864 (0%) sender
> > > [ 5] 0.00-10.05 sec 1.11 MBytes 925 Kbits/sec 0.045 ms 62/864 (7.2%) receiver
> > > iperf Done.
> > > # iperf3 -c 192.168.1.1
> > > Connecting to host 192.168.1.1, port 5201
> > > [ 5] local 192.168.1.2 port 34792 connected to 192.168.1.1 port 5201
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 1.63 MBytes 13.7 Mbits/sec 30 1.41 KBytes
> > > [ 5] 1.00-2.00 sec 7.40 MBytes 62.1 Mbits/sec 65 14.1 KBytes
> > > [ 5] 2.00-3.00 sec 7.83 MBytes 65.7 Mbits/sec 109 2.83 KBytes
> > > [ 5] 3.00-4.00 sec 2.49 MBytes 20.9 Mbits/sec 46 19.8 KBytes
> > > [ 5] 4.00-5.00 sec 7.89 MBytes 66.2 Mbits/sec 109 2.83 KBytes
> > > [ 5] 5.00-6.00 sec 255 KBytes 2.09 Mbits/sec 22 2.83 KBytes
> > > [ 5] 6.00-7.00 sec 4.35 MBytes 36.5 Mbits/sec 74 41.0 KBytes
> > > [ 5] 7.00-8.00 sec 10.9 MBytes 91.8 Mbits/sec 34 45.2 KBytes
> > > [ 5] 8.00-9.00 sec 5.35 MBytes 44.9 Mbits/sec 82 1.41 KBytes
> > > [ 5] 9.00-10.00 sec 1.37 MBytes 11.5 Mbits/sec 73 1.41 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 49.5 MBytes 41.5 Mbits/sec 644 sender
> > > [ 5] 0.00-10.05 sec 49.3 MBytes 41.1 Mbits/sec receiver
> > > iperf Done.
> > >
> > > There is still a noticeable amount of drop/retries, but overall the
> > > results are significantly better. What is the rationale behind the
> > > choice of 16384 in particular? Could this be further improved?
> >
> > Apparently I've been too enthusiastic. After sending this e-mail I've
> > re-generated an image with iproute2 and dd'ed the whole image into an
> > SD card, while until now I was just updating the kernel/DT manually and
> > got the same performances as above without the gro size trick. I need
> > to clarify this further.
> >
>
> Looking a bit at fec, I think fec_enet_txq_put_hdr_tso() is bogus...
>
> txq->tso_hdrs should be properly aligned by definition.
>
> If FEC_QUIRK_SWAP_FRAME is requested, better copy the right thing, not
> original skb->data ???
I've clarified the situation after looking at the build artifacts and
going through (way) longer testing sessions, as successive 10-second
tests can lead to really different results.
On a 4.14.322 kernel (still maintained) I really get extremely crappy
throughput.
On a mainline 6.5 kernel I thought I had a similar issue but this was
due to wrong RGMII-ID timings being used (I ported the board from 4.14
to 6.5 and made a mistake). So with the right timings, I get
much better throughput but still significantly low compared to what I
would expect.
So I tested Eric's fixes:
- TCP fix:
https://lore.kernel.org/netdev/CANn89iJUBujG2AOBYsr0V7qyC5WTgzx0GucO=2ES69tTDJRziw@mail.gmail.com/
- FEC fix:
https://lore.kernel.org/netdev/CANn89iLxKQOY5ZA5o3d1y=v4MEAsAQnzmVDjmLY0_bJPG93tKQ@mail.gmail.com/
As well as different CPUfreq/CPUidle parameters, as pointed out by
Alexander:
https://lore.kernel.org/netdev/2245614.iZASKD2KPV@steina-w/
Here are the results of 100 seconds iperf uplink TCP tests, as reported
by the receiver. First value is the mean, the raw results are in the '(' ')'.
Unit: Mbps
Default setup:
CPUidle yes, CPUfreq yes, TCP fix no, FEC fix no: 30.2 (23.8, 28.4, 38.4)
CPU power management tests (with TCP fix and FEC fix):
CPUidle yes, CPUfreq yes: 26.5 (24.5, 28.5)
CPUidle no, CPUfreq yes: 50.3 (44.8, 55.7)
CPUidle yes, CPUfreq no: 80.2 (75.8, 79.5, 80.8, 81.8, 83.1)
CPUidle no, CPUfreq no: 85.4 (80.6, 81.1, 86.2, 87.5, 91.8)
Eric's fixes tests (No CPUidle, no CPUfreq):
TCP fix yes, FEC fix yes: 85.4 (80.6, 81.1, 86.2, 87.5, 91.8) (same as above)
TCP fix no, FEC fix yes: 82.0 (74.5, 75.9, 82.2, 87.5, 90.2)
TCP fix yes, FEC fix no: 81.4 (77.5, 77.7, 82.8, 83.7, 85.4)
TCP fix no, FEC fix no: 79.6 (68.2, 77.6, 78.9, 86.4, 87.1)
So indeed the TCP and FEC patches don't seem to have a real impact (or
a small one, I don't know given how scattered are the results). However
there is definitely something wrong with the low power settings and I
believe the Errata pointed by Alexander may have a real impact there
(ERR006687 ENET: Only the ENET wake-up interrupt request can wake the
system from Wait mode [i.MX 6Dual/6Quad Only]), probably that my
hardware lacks the hardware workaround.
I believe the remaining fluctuations are due to the RGMII-ID timings
not being totally optimal, I think I would need to extend them slightly
more in the Tx path but they are already set to the maximum value.
Anyhow, I no longer see any difference in the drop rate between -b1M
and -b0 (<1%) so I believe it is acceptable like that.
Now I might try to track what is missing in 4.14.322 and perhaps ask
for a backport if it's relevant.
Thanks a lot for all your feedback,
Miquèl
Powered by blists - more mailing lists