lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20231017131901.5ae65e4d@xps-13> Date: Tue, 17 Oct 2023 13:19:01 +0200 From: Miquel Raynal <miquel.raynal@...tlin.com> To: Eric Dumazet <edumazet@...gle.com> Cc: "Russell King (Oracle)" <linux@...linux.org.uk>, Wei Fang <wei.fang@....com>, Shenwei Wang <shenwei.wang@....com>, Clark Wang <xiaoning.wang@....com>, davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, linux-imx@....com, netdev@...r.kernel.org, Thomas Petazzoni <thomas.petazzoni@...tlin.com>, Alexandre Belloni <alexandre.belloni@...tlin.com>, Maxime Chevallier <maxime.chevallier@...tlin.com>, Andrew Lunn <andrew@...n.ch>, Stephen Hemminger <stephen@...workplumber.org>, Alexander Stein <alexander.stein@...tq-group.com> Subject: Re: Ethernet issue on imx6 Hi Eric, edumazet@...gle.com wrote on Mon, 16 Oct 2023 21:37:58 +0200: > On Mon, Oct 16, 2023 at 5:37 PM Miquel Raynal <miquel.raynal@...tlin.com> wrote: > > > > Hello again, > > > > > > > # iperf3 -c 192.168.1.1 > > > > > Connecting to host 192.168.1.1, port 5201 > > > > > [ 5] local 192.168.1.2 port 37948 connected to 192.168.1.1 port 5201 > > > > > [ ID] Interval Transfer Bitrate Retr Cwnd > > > > > [ 5] 0.00-1.00 sec 11.3 MBytes 94.5 Mbits/sec 43 32.5 KBytes > > > > > [ 5] 1.00-2.00 sec 3.29 MBytes 27.6 Mbits/sec 26 1.41 KBytes > > > > > [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes > > > > > [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > > > > > [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 5 1.41 KBytes > > > > > [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes > > > > > [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes > > > > > [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes > > > > > [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > > > > > [ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes > > > > > > > > > > Thanks, > > > > > Miquèl > > > > > > > > Can you experiment with : > > > > > > > > - Disabling TSO on your NIC (ethtool -K eth0 tso off) > > > > - Reducing max GSO size (ip link set dev eth0 gso_max_size 16384) > > > > > > > > I suspect some kind of issues with fec TX completion, vs TSO emulation. > > > > > > Wow, appears to have a significant effect. I am using Busybox's iproute > > > implementation which does not know gso_max_size, but I hacked directly > > > into netdevice.h just to see if it would have an effect. I'm adding > > > iproute2 to the image for further testing. > > > > > > Here is the diff: > > > > > > --- a/include/linux/netdevice.h > > > +++ b/include/linux/netdevice.h > > > @@ -2364,7 +2364,7 @@ struct net_device { > > > /* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE), > > > * and shinfo->gso_segs is a 16bit field. > > > */ > > > -#define GSO_MAX_SIZE (8 * GSO_MAX_SEGS) > > > +#define GSO_MAX_SIZE 16384u > > > > > > unsigned int gso_max_size; > > > #define TSO_LEGACY_MAX_SIZE 65536 > > > > > > And here are the results: > > > > > > # ethtool -K eth0 tso off > > > # iperf3 -c 192.168.1.1 -u -b1M > > > Connecting to host 192.168.1.1, port 5201 > > > [ 5] local 192.168.1.2 port 50490 connected to 192.168.1.1 port 5201 > > > [ ID] Interval Transfer Bitrate Total Datagrams > > > [ 5] 0.00-1.00 sec 123 KBytes 1.01 Mbits/sec 87 > > > [ 5] 1.00-2.00 sec 122 KBytes 996 Kbits/sec 86 > > > [ 5] 2.00-3.00 sec 122 KBytes 996 Kbits/sec 86 > > > [ 5] 3.00-4.00 sec 123 KBytes 1.01 Mbits/sec 87 > > > [ 5] 4.00-5.00 sec 122 KBytes 996 Kbits/sec 86 > > > [ 5] 5.00-6.00 sec 122 KBytes 996 Kbits/sec 86 > > > [ 5] 6.00-7.00 sec 123 KBytes 1.01 Mbits/sec 87 > > > [ 5] 7.00-8.00 sec 122 KBytes 996 Kbits/sec 86 > > > [ 5] 8.00-9.00 sec 122 KBytes 996 Kbits/sec 86 > > > [ 5] 9.00-10.00 sec 123 KBytes 1.01 Mbits/sec 87 > > > - - - - - - - - - - - - - - - - - - - - - - - - - > > > [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams > > > [ 5] 0.00-10.00 sec 1.19 MBytes 1.00 Mbits/sec 0.000 ms 0/864 (0%) sender > > > [ 5] 0.00-10.05 sec 1.11 MBytes 925 Kbits/sec 0.045 ms 62/864 (7.2%) receiver > > > iperf Done. > > > # iperf3 -c 192.168.1.1 > > > Connecting to host 192.168.1.1, port 5201 > > > [ 5] local 192.168.1.2 port 34792 connected to 192.168.1.1 port 5201 > > > [ ID] Interval Transfer Bitrate Retr Cwnd > > > [ 5] 0.00-1.00 sec 1.63 MBytes 13.7 Mbits/sec 30 1.41 KBytes > > > [ 5] 1.00-2.00 sec 7.40 MBytes 62.1 Mbits/sec 65 14.1 KBytes > > > [ 5] 2.00-3.00 sec 7.83 MBytes 65.7 Mbits/sec 109 2.83 KBytes > > > [ 5] 3.00-4.00 sec 2.49 MBytes 20.9 Mbits/sec 46 19.8 KBytes > > > [ 5] 4.00-5.00 sec 7.89 MBytes 66.2 Mbits/sec 109 2.83 KBytes > > > [ 5] 5.00-6.00 sec 255 KBytes 2.09 Mbits/sec 22 2.83 KBytes > > > [ 5] 6.00-7.00 sec 4.35 MBytes 36.5 Mbits/sec 74 41.0 KBytes > > > [ 5] 7.00-8.00 sec 10.9 MBytes 91.8 Mbits/sec 34 45.2 KBytes > > > [ 5] 8.00-9.00 sec 5.35 MBytes 44.9 Mbits/sec 82 1.41 KBytes > > > [ 5] 9.00-10.00 sec 1.37 MBytes 11.5 Mbits/sec 73 1.41 KBytes > > > - - - - - - - - - - - - - - - - - - - - - - - - - > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.00 sec 49.5 MBytes 41.5 Mbits/sec 644 sender > > > [ 5] 0.00-10.05 sec 49.3 MBytes 41.1 Mbits/sec receiver > > > iperf Done. > > > > > > There is still a noticeable amount of drop/retries, but overall the > > > results are significantly better. What is the rationale behind the > > > choice of 16384 in particular? Could this be further improved? > > > > Apparently I've been too enthusiastic. After sending this e-mail I've > > re-generated an image with iproute2 and dd'ed the whole image into an > > SD card, while until now I was just updating the kernel/DT manually and > > got the same performances as above without the gro size trick. I need > > to clarify this further. > > > > Looking a bit at fec, I think fec_enet_txq_put_hdr_tso() is bogus... > > txq->tso_hdrs should be properly aligned by definition. > > If FEC_QUIRK_SWAP_FRAME is requested, better copy the right thing, not > original skb->data ??? I've clarified the situation after looking at the build artifacts and going through (way) longer testing sessions, as successive 10-second tests can lead to really different results. On a 4.14.322 kernel (still maintained) I really get extremely crappy throughput. On a mainline 6.5 kernel I thought I had a similar issue but this was due to wrong RGMII-ID timings being used (I ported the board from 4.14 to 6.5 and made a mistake). So with the right timings, I get much better throughput but still significantly low compared to what I would expect. So I tested Eric's fixes: - TCP fix: https://lore.kernel.org/netdev/CANn89iJUBujG2AOBYsr0V7qyC5WTgzx0GucO=2ES69tTDJRziw@mail.gmail.com/ - FEC fix: https://lore.kernel.org/netdev/CANn89iLxKQOY5ZA5o3d1y=v4MEAsAQnzmVDjmLY0_bJPG93tKQ@mail.gmail.com/ As well as different CPUfreq/CPUidle parameters, as pointed out by Alexander: https://lore.kernel.org/netdev/2245614.iZASKD2KPV@steina-w/ Here are the results of 100 seconds iperf uplink TCP tests, as reported by the receiver. First value is the mean, the raw results are in the '(' ')'. Unit: Mbps Default setup: CPUidle yes, CPUfreq yes, TCP fix no, FEC fix no: 30.2 (23.8, 28.4, 38.4) CPU power management tests (with TCP fix and FEC fix): CPUidle yes, CPUfreq yes: 26.5 (24.5, 28.5) CPUidle no, CPUfreq yes: 50.3 (44.8, 55.7) CPUidle yes, CPUfreq no: 80.2 (75.8, 79.5, 80.8, 81.8, 83.1) CPUidle no, CPUfreq no: 85.4 (80.6, 81.1, 86.2, 87.5, 91.8) Eric's fixes tests (No CPUidle, no CPUfreq): TCP fix yes, FEC fix yes: 85.4 (80.6, 81.1, 86.2, 87.5, 91.8) (same as above) TCP fix no, FEC fix yes: 82.0 (74.5, 75.9, 82.2, 87.5, 90.2) TCP fix yes, FEC fix no: 81.4 (77.5, 77.7, 82.8, 83.7, 85.4) TCP fix no, FEC fix no: 79.6 (68.2, 77.6, 78.9, 86.4, 87.1) So indeed the TCP and FEC patches don't seem to have a real impact (or a small one, I don't know given how scattered are the results). However there is definitely something wrong with the low power settings and I believe the Errata pointed by Alexander may have a real impact there (ERR006687 ENET: Only the ENET wake-up interrupt request can wake the system from Wait mode [i.MX 6Dual/6Quad Only]), probably that my hardware lacks the hardware workaround. I believe the remaining fluctuations are due to the RGMII-ID timings not being totally optimal, I think I would need to extend them slightly more in the Tx path but they are already set to the maximum value. Anyhow, I no longer see any difference in the drop rate between -b1M and -b0 (<1%) so I believe it is acceptable like that. Now I might try to track what is missing in 4.14.322 and perhaps ask for a backport if it's relevant. Thanks a lot for all your feedback, Miquèl
Powered by blists - more mailing lists