netdev - Re: Optimizing instruction-cache, more packets at each stage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1453330945.1223.329.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Wed, 20 Jan 2016 15:02:25 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Or Gerlitz <gerlitz.or@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jesper Dangaard Brouer <brouer@...hat.com>,
	Linux Netdev List <netdev@...r.kernel.org>,
	Alexander Duyck <alexander.duyck@...il.com>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	borkmann@...earbox.net, marek@...udflare.com,
	hannes@...essinduktion.org, Florian Westphal <fw@...len.de>,
	Paolo Abeni <pabeni@...hat.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	Amir Vadai <amirva@...il.com>
Subject: Re: Optimizing instruction-cache, more packets at each stage

On Thu, 2016-01-21 at 00:20 +0200, Or Gerlitz wrote:

> Dave, I assume you refer to the RSS hash result which is written by
> NIC HWs to the completion descriptor and then fed to the stack by the
> driver calling skb_set_hash(.)? Well, this can be taken even further.
> 
> Suppose a the NIC can be programmed by the kernel to provide a unique
> flow tag on the completion descriptor per a given 5/12 tuple which
> represents a TCP (or other logical) stream a higher level in the stack
> is identifying to be in progress, and the driver plants that in
> skb->mark before calling into the stack.
> 
> I guess this could yield nice speed up for the GRO stack -- matching
> based on single 32 bit value instead of per protocol (eth, vlan, ip,
> tcp) checks [1] - or hint which packets from the current window of
> "ready" completion descriptor could be grouped together for upper
> processing?

We already use the RSS hash (skb->hash) in GRO engine to speedup the
parsing : If skb->hash differs, then there is no point trying to
aggregate two packets.

Note that if we had a l4 hash for all provided packets, GRO could use a
hash table instead of one single list of skbs.