lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM6PR0402MB36074675DB9DBCD9788DCE9BFF6F0@AM6PR0402MB3607.eurprd04.prod.outlook.com>
Date:   Tue, 30 Jun 2020 09:47:56 +0000
From:   Andy Duan <fugang.duan@....com>
To:     Tobias Waldekranz <tobias@...dekranz.com>,
        David Miller <davem@...emloft.net>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx
 starvation under high rx load

From: Tobias Waldekranz <tobias@...dekranz.com> Sent: Tuesday, June 30, 2020 5:13 PM
> On Tue Jun 30, 2020 at 11:02 AM CEST, Andy Duan wrote:
> > From: Tobias Waldekranz <tobias@...dekranz.com> Sent: Tuesday, June
> > 30,
> > 2020 4:56 PM
> > > On Tue Jun 30, 2020 at 10:26 AM CEST, Andy Duan wrote:
> > > > From: Tobias Waldekranz <tobias@...dekranz.com> Sent: Tuesday,
> > > > June 30,
> > > > 2020 3:31 PM
> > > > > On Tue Jun 30, 2020 at 8:27 AM CEST, Andy Duan wrote:
> > > > > > From: Tobias Waldekranz <tobias@...dekranz.com> Sent: Tuesday,
> > > > > > June 30,
> > > > > > 2020 12:29 AM
> > > > > > > On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:
> > > > > > > > I never seem bandwidth test cause netdev watchdog trip.
> > > > > > > > Can you describe the reproduce steps on the commit, then
> > > > > > > > we can reproduce it on my local. Thanks.
> > > > > > >
> > > > > > > My setup uses a i.MX8M Nano EVK connected to an ethernet
> > > > > > > switch, but can get the same results with a direct connection to a
> PC.
> > > > > > >
> > > > > > > On the iMX, configure two VLANs on top of the FEC and enable
> > > > > > > IPv4 forwarding.
> > > > > > >
> > > > > > > On the PC, configure two VLANs and put them in different
> > > namespaces.
> > > > > > > From one namespace, use trafgen to generate a flow that the
> > > > > > > iMX will route from the first VLAN to the second and then
> > > > > > > back towards the second namespace on the PC.
> > > > > > >
> > > > > > > Something like:
> > > > > > >
> > > > > > >     {
> > > > > > >         eth(sa=PC_MAC, da=IMX_MAC),
> > > > > > >         ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
> > > > > > >         udp(sp=1, dp=2),
> > > > > > >         "Hello world"
> > > > > > >     }
> > > > > > >
> > > > > > > Wait a couple of seconds and then you'll see the output from
> > > fec_dump.
> > > > > > >
> > > > > > > In the same setup I also see a weird issue when running a
> > > > > > > TCP flow using iperf3. Most of the time (~70%) when i start
> > > > > > > the
> > > > > > > iperf3 client I'll see ~450Mbps of throughput. In the other
> > > > > > > case
> > > > > > > (~30%) I'll see ~790Mbps. The system is "stably bi-modal", i.e.
> > > > > > > whichever rate is reached in the beginning is then sustained
> > > > > > > for as long as the session is kept
> > > > > alive.
> > > > > > >
> > > > > > > I've inserted some tracepoints in the driver to try to
> > > > > > > understand what's going
> > > > > > > on:
> > > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A
> > > > > > > %2F%25
> > > > > > > 2Fsv
> > > > > > > gsha
> > > > >
> > >
> re.com%2Fi%2FMVp.svg&amp;data=02%7C01%7Cfugang.duan%40nxp.com%
> > > > > > >
> > > > >
> > >
> 7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c
> > > > > > >
> > > > >
> > >
> 301635%7C0%7C0%7C637290519453656013&amp;sdata=by4ShOkmTaRkFfE
> > > > > > > 0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&amp;reserved=0
> > > > > > >
> > > > > > > What I can't figure out is why the Tx buffers seem to be
> > > > > > > collected at a much slower rate in the slow case (top in the
> > > > > > > picture). If we fall behind in one NAPI poll, we should
> > > > > > > catch up at the next call (which we
> > > > > can see in the fast case).
> > > > > > > But in the slow case we keep falling further and further
> > > > > > > behind until we freeze the queue. Is this something you've
> > > > > > > ever observed? Any
> > > > > ideas?
> > > > > >
> > > > > > Before, our cases don't reproduce the issue, cpu resource has
> > > > > > better bandwidth than ethernet uDMA then there have chance to
> > > > > > complete current NAPI. The next, work_tx get the update, never
> > > > > > catch
> > > the issue.
> > > > >
> > > > > It appears it has nothing to do with routing back out through
> > > > > the same interface.
> > > > >
> > > > > I get the same bi-modal behavior if just run the iperf3 server
> > > > > on the iMX and then have it be the transmitting part, i.e. on the PC I
> run:
> > > > >
> > > > >     iperf3 -c $IMX_IP -R
> > > > >
> > > > > I would be very interesting to see what numbers you see in this
> scenario.
> > > > I just have on imx8mn evk in my hands, and run the case, the
> > > > numbers is ~940Mbps as below.
> > > >
> > > > root@...8mnevk:~# iperf3 -s
> > > > -----------------------------------------------------------
> > > > Server listening on 5201
> > > > -----------------------------------------------------------
> > > > Accepted connection from 10.192.242.132, port 43402 [ 5] local
> > > > 10.192.242.96 port 5201 connected to 10.192.242.132 port
> > > > 43404
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 109
> > > > MBytes 913 Mbits/sec 0 428 KBytes [ 5] 1.00-2.00 sec 112 MBytes
> > > > 943 Mbits/sec 0 447 KBytes [ 5] 2.00-3.00 sec 112 MBytes 941
> > > > Mbits/sec 0
> > > > 472 KBytes [ 5] 3.00-4.00 sec 113 MBytes 944 Mbits/sec 0 472
> > > > KBytes [ 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec 0 472 KBytes [
> > > > 5] 5.00-6.00 sec 112 MBytes 936 Mbits/sec 0 472 KBytes [ 5]
> > > > 6.00-7.00 sec 113 MBytes 945 Mbits/sec 0 472 KBytes [ 5] 7.00-8.00
> > > > sec 112 MBytes 944 Mbits/sec 0 472 KBytes [ 5] 8.00-9.00 sec 112
> > > > MBytes 941 Mbits/sec 0
> > > > 472 KBytes [ 5] 9.00-10.00 sec 112 MBytes 940 Mbits/sec 0 472
> > > > KBytes [ 5] 10.00-10.04 sec 4.16 MBytes 873 Mbits/sec 0 472 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval
> > > > Transfer Bitrate Retr [ 5] 0.00-10.04 sec 1.10 GBytes 939
> > > > Mbits/sec 0 sender
> > >
> > > Are you running the client with -R so that the iMX is the transmitter?
> > > What if you run the test multiple times, do you get the same result each
> time?
> >
> > Of course, PC command like: iperf3 -c 10.192.242.96 -R Yes, the same
> > result for each time.
> 
> Very strange, I've now reduced my setup to a simple direct connection
> between iMX and PC and I still see the same issue:
> 
> for i in $(seq 5); do iperf3 -c 10.0.2.1 -R -t2; sleep 1; done Connecting to host
> 10.0.2.1, port 5201 Reverse mode, remote host 10.0.2.1 is sending [  5] local
> 10.0.2.2 port 53978 connected to 10.0.2.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   110 MBytes   919 Mbits/sec
> [  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec    0   0.00
> Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-2.04   sec   223 MBytes   918 Mbits/sec    0
> sender
> [  5]   0.00-2.00   sec   222 MBytes   930 Mbits/sec
> receiver
> 
> iperf Done.
> Connecting to host 10.0.2.1, port 5201
> Reverse mode, remote host 10.0.2.1 is sending [  5] local 10.0.2.2 port
> 53982 connected to 10.0.2.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec  55.8 MBytes   468 Mbits/sec
> [  5]   1.00-2.00   sec  56.3 MBytes   472 Mbits/sec    0   0.00
> Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-2.04   sec   113 MBytes   464 Mbits/sec    0
> sender
> [  5]   0.00-2.00   sec   112 MBytes   470 Mbits/sec
> receiver
> 
> iperf Done.
> Connecting to host 10.0.2.1, port 5201
> Reverse mode, remote host 10.0.2.1 is sending [  5] local 10.0.2.2 port
> 53986 connected to 10.0.2.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec  55.7 MBytes   467 Mbits/sec
> [  5]   1.00-2.00   sec  56.3 MBytes   472 Mbits/sec    0   0.00
> Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-2.04   sec   113 MBytes   464 Mbits/sec    0
> sender
> [  5]   0.00-2.00   sec   112 MBytes   470 Mbits/sec
> receiver
> 
> iperf Done.
> Connecting to host 10.0.2.1, port 5201
> Reverse mode, remote host 10.0.2.1 is sending [  5] local 10.0.2.2 port
> 53990 connected to 10.0.2.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   110 MBytes   920 Mbits/sec
> [  5]   1.00-2.00   sec   112 MBytes   942 Mbits/sec    0   0.00
> Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-2.04   sec   223 MBytes   919 Mbits/sec    0
> sender
> [  5]   0.00-2.00   sec   222 MBytes   931 Mbits/sec
> receiver
> 
> iperf Done.
> Connecting to host 10.0.2.1, port 5201
> Reverse mode, remote host 10.0.2.1 is sending [  5] local 10.0.2.2 port
> 53994 connected to 10.0.2.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   110 MBytes   920 Mbits/sec
> [  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec    0   0.00
> Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-2.04   sec   223 MBytes   918 Mbits/sec    0
> sender
> [  5]   0.00-2.00   sec   222 MBytes   931 Mbits/sec
> receiver
> 
> iperf Done.
> 
> Which kernel version are you running? I'm on be74294ffa24 plus the
> starvation fix in this patch.

Tobias, sorry, I am not running the net tree, I run the linux-imx tree:
https://source.codeaurora.org/external/imx/linux-imx/refs/heads
branch:imx_5.4.24_2.1.0
But the data follow is the same as net tree.

log on PC: (imx run as server)
$ for i in $(seq 5); do iperf3 -c 10.192.242.96 -R -t2; sleep 1; done
Connecting to host 10.192.242.96, port 5201
Reverse mode, remote host 10.192.242.96 is sending
[  4] local 10.192.242.132 port 46504 connected to 10.192.242.96 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   112 MBytes   939 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-2.00   sec   226 MBytes   949 Mbits/sec    0             sender
[  4]   0.00-2.00   sec   225 MBytes   942 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.192.242.96, port 5201
Reverse mode, remote host 10.192.242.96 is sending
[  4] local 10.192.242.132 port 46510 connected to 10.192.242.96 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   111 MBytes   933 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-2.00   sec   226 MBytes   949 Mbits/sec    0             sender
[  4]   0.00-2.00   sec   224 MBytes   939 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.192.242.96, port 5201
Reverse mode, remote host 10.192.242.96 is sending
[  4] local 10.192.242.132 port 46516 connected to 10.192.242.96 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   112 MBytes   936 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-2.00   sec   226 MBytes   949 Mbits/sec    0             sender
[  4]   0.00-2.00   sec   224 MBytes   940 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.192.242.96, port 5201
Reverse mode, remote host 10.192.242.96 is sending
[  4] local 10.192.242.132 port 46522 connected to 10.192.242.96 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   111 MBytes   934 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-2.00   sec   226 MBytes   946 Mbits/sec    0             sender
[  4]   0.00-2.00   sec   224 MBytes   939 Mbits/sec                  receiver

iperf Done.
Connecting to host 10.192.242.96, port 5201
Reverse mode, remote host 10.192.242.96 is sending
[  4] local 10.192.242.132 port 46528 connected to 10.192.242.96 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   112 MBytes   936 Mbits/sec
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-2.00   sec   226 MBytes   947 Mbits/sec    0             sender
[  4]   0.00-2.00   sec   224 MBytes   940 Mbits/sec                  receiver

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ