netdev - Re: [net-next PATCH] net: ipv4: fix listify ip_rcv

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJ3xEMiUbjSM4-EZkm-_b+EtAS4NZOA+Whcv3V3=UT+CwmOvQA@mail.gmail.com>
Date:   Thu, 12 Jul 2018 23:10:28 +0300
From:   Or Gerlitz <gerlitz.or@...il.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>,
        Edward Cree <ecree@...arflare.com>
Cc:     Saeed Mahameed <saeedm@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [net-next PATCH] net: ipv4: fix listify ip_rcv_finish in case of forwarding

On Wed, Jul 11, 2018 at 11:06 PM, Jesper Dangaard Brouer
<brouer@...hat.com> wrote:

> Well, I would prefer you to implement those.  I just did a quick
> implementation (its trivially easy) so I have something to benchmark
> with.  The performance boost is quite impressive!

sounds good, but wait

> One reason I didn't "just" send a patch, is that Edward so-fare only
> implemented netif_receive_skb_list() and not napi_gro_receive_list().

sfc does't support gro?! doesn't make sense.. Edward?

> And your driver uses napi_gro_receive().  This sort-of disables GRO for
> your driver, which is not a choice I can make.  Interestingly I get
> around the same netperf TCP_STREAM performance.

Same TCP performance

with GRO and no rx-batching

or

without GRO and yes rx-batching

is by far not intuitive result to me unless both these techniques
mostly serve to eliminate lots of instruction cache misses and the
TCP stack is so much optimized that if the code is in the cache,
going through it once with 64K byte GRO-ed packet is like going
through it ~40 (64K/1500) times with non GRO-ed packets.

What's the baseline (with GRO and no rx-batching) number on your setup?

> I assume we can get even better perf if we "listify" napi_gro_receive.

yeah, that would be very interesting to get there