lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <889f2dc5e646992033e0d9b0951d5a42f1907e07.camel@redhat.com>
Date:   Wed, 22 Mar 2023 10:59:30 +0100
From:   Paolo Abeni <pabeni@...hat.com>
To:     Richard Gobert <richardbgobert@...il.com>, davem@...emloft.net,
        edumazet@...gle.com, kuba@...nel.org, dsahern@...nel.org,
        alexanderduyck@...com, lucien.xin@...il.com, lixiaoyan@...gle.com,
        iwienand@...hat.com, leon@...nel.org, ye.xingchen@....com.cn,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 2/2] gro: optimise redundant parsing of packets

On Mon, 2023-03-20 at 18:00 +0100, Richard Gobert wrote:
> Currently the IPv6 extension headers are parsed twice: first in
> ipv6_gro_receive, and then again in ipv6_gro_complete.
> 
> By using the new ->transport_proto field, and also storing the size of the
> network header, we can avoid parsing extension headers a second time in
> ipv6_gro_complete (which saves multiple memory dereferences and conditional
> checks inside ipv6_exthdrs_len for a varying amount of extension headers in
> IPv6 packets).
> 
> The implementation had to handle both inner and outer layers in case of
> encapsulation (as they can't use the same field). I've applied a similar
> optimisation to Ethernet.
> 
> Performance tests for TCP stream over IPv6 with a varying amount of
> extension headers demonstrate throughput improvement of ~0.7%.

I'm surprised that the improvement is measurable: for large aggregate
packets a single ipv6_exthdrs_len() call is avoided out of tens calls
for the individual pkts. Additionally such figure is comparable to
noise level in my tests.

This adds a couple of additional branches for the common (no extensions
header) case. 

while patch 1/2 could be useful, patch 2/2 overall looks not worthy to
me.

I suggest to re-post for inclusion only patch 1, unless others have
strong different opinions.

Cheers,

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ