netdev - Re: [PATCH v3 net-next 0/4] net: batched receive in GRO path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b163a6c8-d522-3fdd-9e22-ceed1eb5a7b3@gmail.com>
Date:   Thu, 15 Nov 2018 12:08:22 -0800
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Edward Cree <ecree@...arflare.com>,
        linux-net-drivers@...arflare.com, davem@...emloft.net
Cc:     netdev@...r.kernel.org
Subject: Re: [PATCH v3 net-next 0/4] net: batched receive in GRO path

On 11/15/2018 10:43 AM, Edward Cree wrote:

Most of the packet isn't touched and thus won't be brought into cache.
> Only the headers of each packet (worst-case let's say 256 bytes) will
>  be touched during batch processing, that's 16kB.

You assume perfect use of the caches, but part of the cache has collisions.

I am alarmed by the complexity added, for example in GRO, considering
that we also added GRO for UDP.

I dunno, can you show us for example if a reassembly workload can benefit
from all this stuff ?

Paolo Abeni sure will be interested knowing if we can get a 20% increase for this
IP defrag workloads.

If you present numbers for traffic that GRO handles just fine, it does not
really make sense, unless your plan maybe is to remove GRO completely ?

We have observed at Google a constant increase of cpu cycles spent for TCP_RR
on latest kernels. The gap is now about 20% with kernels from two years ago,
and I could not yet find a faulty commit. It seems we add little overhead after
another, and every patch author is convinced he is doing the right thing.

With multi queue NICS, vast majority of napi->poll() invocations handle only one packet.
Unfortunately we can not really increase interrupt mitigations (ethtool -c) 
on NIC without sacrificing latencies.