netdev - Re: [PATCH v2 net-next 0/4] net: batched receive in GRO path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <63c8b2f2-bc44-ee1c-1f94-439e1a73b909@solarflare.com>
Date:   Tue, 11 Sep 2018 19:34:40 +0100
From:   Edward Cree <ecree@...arflare.com>
To:     Eric Dumazet <eric.dumazet@...il.com>, <davem@...emloft.net>
CC:     <linux-net-drivers@...arflare.com>, <netdev@...r.kernel.org>
Subject: Re: [PATCH v2 net-next 0/4] net: batched receive in GRO path

On 07/09/18 03:32, Eric Dumazet wrote:
> Adding this complexity and icache pressure needs more experimental results.
> What about RPC workloads  (eg 100 concurrent netperf -t TCP_RR -- -r 8000,8000 )
>
> Thanks.
Some more results.  Note that the TCP_STREAM figures given in the cover
 letter were '-m 1450'; when I run that with '-m 8000' I hit line rate on
 my 10G NIC on both the old and new code.  Also, these tests are still all
 with IRQs bound to a single core on the RX side.
A further note: the Code Under Test is running on the netserver side (RX
 side for TCP_STREAM tests); the netperf side is running stock RHEL7u3
 (kernel 3.10.0-514.el7.x86_64).  This potentially matters more for the
 TCP_RR test as both sides have to receive data.

TCP_STREAM, 8000 bytes, GRO enabled (4 streams)
old: 9.415 Gbit/s
new: 9.417 Gbit/s
(Welch p = 0.087, n₁ = n₂ = 3)
There was however a noticeable reduction in *TX* CPU usage, of about 15%.
 I don't know why that should be (changes in ack timing, perhaps?)

TCP_STREAM, 8000 bytes, GRO disabled (4 streams)
old: 5.200 Gbit/s
new: 5.839 Gbit/s (12.3% faster)
(Welch p < 0.001, n₁ = n₂ = 6)

TCP_RR, 8000 bytes, GRO enabled (100 streams)
(FoM is one-way latency, 0.5 / tps)
old: 855.833 us
new: 862.033 us (0.7% slower)
(Welch p = 0.040, n₁ = n₂ = 6)

TCP_RR, 8000 bytes, GRO disabled (100 streams)
old: 962.733 us
new: 871.417 us (9.5% faster)
(Welch p < 0.001, n₁ = n₂ = 6)

Conclusion: with GRO on we pay a small but real RR penalty.  With GRO off
 (thus also with traffic that can't be coalesced) we get a noticeable
 speed boost from being able to use netif_receive_skb_list_internal().

-Ed