netdev - RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx starvation under high rx load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AM6PR0402MB36075CF372D7A31932E32B60FF6F0@AM6PR0402MB3607.eurprd04.prod.outlook.com>
Date:   Tue, 30 Jun 2020 06:27:48 +0000
From:   Andy Duan <fugang.duan@....com>
To:     Tobias Waldekranz <tobias@...dekranz.com>,
        David Miller <davem@...emloft.net>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [EXT] Re: [PATCH net-next] net: ethernet: fec: prevent tx
 starvation under high rx load

From: Tobias Waldekranz <tobias@...dekranz.com> Sent: Tuesday, June 30, 2020 12:29 AM
> On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote:
> > I never seem bandwidth test cause netdev watchdog trip.
> > Can you describe the reproduce steps on the commit, then we can
> > reproduce it on my local. Thanks.
> 
> My setup uses a i.MX8M Nano EVK connected to an ethernet switch, but can
> get the same results with a direct connection to a PC.
> 
> On the iMX, configure two VLANs on top of the FEC and enable IPv4
> forwarding.
> 
> On the PC, configure two VLANs and put them in different namespaces. From
> one namespace, use trafgen to generate a flow that the iMX will route from
> the first VLAN to the second and then back towards the second namespace on
> the PC.
> 
> Something like:
> 
>     {
>         eth(sa=PC_MAC, da=IMX_MAC),
>         ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2)
>         udp(sp=1, dp=2),
>         "Hello world"
>     }
> 
> Wait a couple of seconds and then you'll see the output from fec_dump.
> 
> In the same setup I also see a weird issue when running a TCP flow using
> iperf3. Most of the time (~70%) when i start the iperf3 client I'll see
> ~450Mbps of throughput. In the other case (~30%) I'll see ~790Mbps. The
> system is "stably bi-modal", i.e. whichever rate is reached in the beginning is
> then sustained for as long as the session is kept alive.
> 
> I've inserted some tracepoints in the driver to try to understand what's going
> on:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvgsha
> re.com%2Fi%2FMVp.svg&amp;data=02%7C01%7Cfugang.duan%40nxp.com%
> 7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c
> 301635%7C0%7C0%7C637290519453656013&amp;sdata=by4ShOkmTaRkFfE
> 0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&amp;reserved=0
> 
> What I can't figure out is why the Tx buffers seem to be collected at a much
> slower rate in the slow case (top in the picture). If we fall behind in one NAPI
> poll, we should catch up at the next call (which we can see in the fast case).
> But in the slow case we keep falling further and further behind until we freeze
> the queue. Is this something you've ever observed? Any ideas?

Before, our cases don't reproduce the issue, cpu resource has better bandwidth
than ethernet uDMA then there have chance to complete current NAPI. The next,
work_tx get the update, never catch the issue.