[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231013123748.6b200f79@xps-13>
Date: Fri, 13 Oct 2023 12:37:48 +0200
From: Miquel Raynal <miquel.raynal@...tlin.com>
To: James Chapman <jchapman@...alix.com>
Cc: Wei Fang <wei.fang@....com>, Shenwei Wang <shenwei.wang@....com>, Clark
Wang <xiaoning.wang@....com>, Russell King <linux@...linux.org.uk>,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, linux-imx@....com, netdev@...r.kernel.org, Thomas
Petazzoni <thomas.petazzoni@...tlin.com>, Alexandre Belloni
<alexandre.belloni@...tlin.com>, Maxime Chevallier
<maxime.chevallier@...tlin.com>
Subject: Re: Ethernet issue on imx6
Hi James,
jchapman@...alix.com wrote on Fri, 13 Oct 2023 09:50:49 +0100:
> On 12/10/2023 18:34, Miquel Raynal wrote:
> > Hello,
> >
> > I've been scratching my foreheads for weeks on a strange imx6
> > network issue, I need help to go further, as I feel a bit clueless now.
> >
> > Here is my setup :
> > - Custom imx6q board
> > - Bootloader: U-Boot 2017.11 (also tried with a 2016.03)
> > - Kernel : 4.14(.69,.146,.322), v5.10 and v6.5 with the same behavior
> > - The MAC (fec driver) is connected to a Micrel 9031 PHY
> > - The PHY is connected to the link partner through an industrial cable
> > - Testing 100BASE-T (link is stable)
> >
> > The RGMII-ID timings are probably not totally optimal but offer rather
> > good performance. In UDP with iperf3:
> > * Downlink (host to the board) runs at full speed with 0% drop
> > * Uplink (board to host) runs at full speed with <1% drop
> >
> > However, if I ever try to limit the bandwidth in uplink (only), the drop
> > rate rises significantly, up to 30%:
> >
> > //192.168.1.1 is my host, so the below lines are from the board:
> > # iperf3 -c 192.168.1.1 -u -b100M
> > [ 5] 0.00-10.05 sec 113 MBytes 94.6 Mbits/sec 0.044 ms 467/82603 (0.57%) receiver
> > # iperf3 -c 192.168.1.1 -u -b90M
> > [ 5] 0.00-10.04 sec 90.5 MBytes 75.6 Mbits/sec 0.146 ms 12163/77688 (16%) receiver
> > # iperf3 -c 192.168.1.1 -u -b80M
> > [ 5] 0.00-10.05 sec 66.4 MBytes 55.5 Mbits/sec 0.162 ms 20937/69055 (30%) receiver
> >
> > One direct consequence, I believe, is that tcp transfers quickly stall
> > or run at an insanely low speed (~40kiB/s).
> >
> > I've tried to disable all the hardware offloading reported by ethtool
> > with no additional success.
> >
> > Last but not least, I observe another very strange behavior: when I
> > perform an uplink transfer at a "reduced" speed (80Mbps or below), as
> > said above, I observe a ~30% drop rate. But if I run a full speed UDP
> > transfer in downlink at the same time, the drop rate lowers to ~3-4%.
> > See below, this is an iperf server on my host receiving UDP traffic from
> > my board. After 5 seconds I start a full speed UDP transfer from the
> > host to the board:
> >
> > [ 5] local 192.168.1.1 port 5201 connected to 192.168.1.2 port 57216
> > [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> > [ 5] 0.00-1.00 sec 6.29 MBytes 52.7 Mbits/sec 0.152 ms 2065/6617 (31%)
> > [ 5] 1.00-2.00 sec 6.50 MBytes 54.6 Mbits/sec 0.118 ms 2199/6908 (32%)
> > [ 5] 2.00-3.00 sec 6.64 MBytes 55.7 Mbits/sec 0.123 ms 2099/6904 (30%)
> > [ 5] 3.00-4.00 sec 6.58 MBytes 55.2 Mbits/sec 0.091 ms 2141/6905 (31%)
> > [ 5] 4.00-5.00 sec 6.59 MBytes 55.3 Mbits/sec 0.092 ms 2134/6907 (31%)
> > [ 5] 5.00-6.00 sec 8.36 MBytes 70.1 Mbits/sec 0.088 ms 853/6904 (12%)
> > [ 5] 6.00-7.00 sec 9.14 MBytes 76.7 Mbits/sec 0.085 ms 281/6901 (4.1%)
> > [ 5] 7.00-8.00 sec 9.19 MBytes 77.1 Mbits/sec 0.147 ms 255/6911 (3.7%)
> > [ 5] 8.00-9.00 sec 9.22 MBytes 77.3 Mbits/sec 0.160 ms 233/6907 (3.4%)
> > [ 5] 9.00-10.00 sec 9.25 MBytes 77.6 Mbits/sec 0.129 ms 211/6906 (3.1%)
> > [ 5] 10.00-10.04 sec 392 KBytes 76.9 Mbits/sec 0.113 ms 11/288 (3.8%)
> >
> > If the downlink transfer is not at full speed, I don't observe any
> > difference.
> >
> > I've commented out the runtime_pm callbacks in the fec driver, but
> > nothing changed.
> >
> > Any hint or idea will be highly appreciated!
> >
> > Thanks a lot,
> > Miquèl
> >
> Check your board's interrupt configuration. At high data rates, NAPI may mask interrupt delivery/routing issues since NAPI keeps interrupts disabled longer. Also, if the CPU has hardware interrupt coalescing features enabled, these may not play well with NAPI.
>
> Low level irq configuration is quite complex (and flexible) in devices like iMX. It may be further complicated by some of it being done by the bootloader. So perhaps experiment with the fec driver's NAPI weight and debug the irq handler first to test whether interrupt handling is working as expected on your board before digging in the low level, board-specific irq setup code.
Thanks a lot for looking into this. I've tried to play a little bit
with the NAPI budget but saw no difference at all in the results. With
this new information in mind, do you think I should look deeper?
Thanks,
Miquèl
Powered by blists - more mailing lists