netdev - Re: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131120215435.GT8581@1wt.eu>
Date:	Wed, 20 Nov 2013 22:54:35 +0100
From:	Willy Tarreau <w@....eu>
To:	Arnaud Ebalard <arno@...isbad.org>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>,
	Florian Fainelli <f.fainelli@...il.com>,
	simon.guinot@...uanux.org, netdev@...r.kernel.org,
	edumazet@...gle.com, Cong Wang <xiyou.wangcong@...il.com>,
	linux-arm-kernel@...ts.infradead.org
Subject: Re: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s

Hi Arnaud,

On Wed, Nov 20, 2013 at 10:28:50PM +0100, Arnaud Ebalard wrote:
> With current Linus tree (head being b4789b8e: aacraid: prevent invalid
> pointer dereference), as a baseline here is what I get:
> 
>  w/ tcp_wmem left w/ default values (4096 16384 4071360)
> 
>   via netperf (TCP_MAERTS/TCP_STREAM): 151.13 / 935.50 Mbits/s
>   via wget against apache: 15.4 MB/s
>   via wget against nginx: 104 MB/s
>  
>  w/ tcp_wmem set to 4096 16384 262144:
> 
>   via netperf (TCP_MAERTS/TCP_STREAM): 919.89 / 935.50 Mbits/s
>   via wget against apache: 63.3 MB/s
>   via wget against nginx: 104 MB/s
>  
> With your patch on top of it (and tcp_wmem kept at its default value):
> 
>  via netperf: 939.16 / 935.44 Mbits/s
>  via wget against apache: 65.9 MB/s (top reports 69.5 sy, 30.1 si
>                                      and 72% CPU for apache2)
>  via wget against nginx: 106 MB/s
> 
> 
> With your patch and MVNETA_TX_DONE_TIMER_PERIOD set to 1 instead of 10
> (still w/ and tcp_wmem kept at its default value):
> 
>  via netperf: 939.12 / 935.84 Mbits/s
>  via wget against apache: 63.7 MB/s
>  via wget against nginx: 108 MB/s
> 
> So:
> 
>  - First, Eric's patch sitting in Linus tree does fix the regression
>    I had on 3.11.7 and early 3.12 (15.4 MB/s vs 256KB/s).
> 
>  - As can be seen in the results of first test, Eric's patch still
>    requires some additional tweaking of tcp_wmem to get netperf and
>    apache somewhat happy w/ perfectible drivers (63.3 MB/s instead of
>    15.4MB/s by setting max tcp send buffer space to 256KB for apache).
> 
>  - For unknown reasons, nginx manages to provide a 104MB/s download rate
>    even with a tcp_wmem set to default and no specific patch of mvneta.
> 
>  - Now, Willy's patch seems to makes netperf happy (link saturated from
>    server to client), w/o tweaking tcp_wmem.
> 
>  - Again with Willy's patch I guess the "limitations" of the platform
>    (1.2GHz CPU w/ 512MB of RAM) somehow prevent Apache to saturate the
>    link. All I can say is that the same test some months ago on a 1.6GHz
>    ARMv5TE (kirkwood 88f6282) w/ 256MB of RAM gave me 108MB/s. I do not
>    know if it is some apache regression, some mvneta vs mv63xx_eth
>    difference or some CPU frequency issue but having netperf and  nginx
>    happy make me wonder about Apache.
> 
>  - Willy, setting MVNETA_TX_DONE_TIMER_PERIOD to 1 instead of 10 w/ your
>    patch does not improve the already good value I get w/ your patch.

Great, thanks for your detailed tests! Concerning Apache, it's common to
see it consume more CPU than others, which makes it more sensible to small
devices like these ones (which BTW have a very small cache and only a 16bit
RAM bus). Please still note that there could be a number of other differences
such as Apache always doing TCP_NODELAY resulting in sending incomplete
segments at the end of each buffer, which consume slightly more descriptors.

> In the end if you iterate on your work to push a version of your patch
> upstream, I'll be happy to test it. And thanks for the time you already
> spent!

I'm currently trying to implement TX IRQ handling. I found the registers
description in the neta driver that is provided in Marvell's LSP kernel
that is shipped with some devices using their CPUs. This code is utterly
broken (eg: splice fails with -EBADF) but I think the register descriptions
could be trusted.

I'd rather have real IRQ handling than just relying on mvneta_poll(), so
that we can use it for asymmetric traffic/routing/whatever.

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html