lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ3xEMiOAdu7ghRoc6ZEMG78Z9xyh1M=5Egc3ydh+97bT3_9fw@mail.gmail.com>
Date:	Thu, 21 Jan 2016 00:20:24 +0200
From:	Or Gerlitz <gerlitz.or@...il.com>
To:	David Miller <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>
Cc:	Jesper Dangaard Brouer <brouer@...hat.com>,
	Linux Netdev List <netdev@...r.kernel.org>,
	Alexander Duyck <alexander.duyck@...il.com>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	borkmann@...earbox.net, marek@...udflare.com,
	hannes@...essinduktion.org, Florian Westphal <fw@...len.de>,
	Paolo Abeni <pabeni@...hat.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	Amir Vadai <amirva@...il.com>
Subject: Re: Optimizing instruction-cache, more packets at each stage

On Mon, Jan 18, 2016 at 6:24 PM, David Miller <davem@...emloft.net> wrote:
> From: Jesper Dangaard Brouer <brouer@...hat.com>
> Date: Mon, 18 Jan 2016 11:27:03 +0100
>
>> Down in the driver layer (RX), I think it is too early to categorize
>> Related/Unrelated SKB's, because we want to delay touching packet-data
>> as long as possible (waiting for the prefetcher to get data into
>> cache).
>
> You don't need to touch the headers in order to have a good idea
> as to whether there is a strong possibility packets are related
> or not.
>
> We have the hash available.

Dave, I assume you refer to the RSS hash result which is written by
NIC HWs to the completion descriptor and then fed to the stack by the
driver calling skb_set_hash(.)? Well, this can be taken even further.

Suppose a the NIC can be programmed by the kernel to provide a unique
flow tag on the completion descriptor per a given 5/12 tuple which
represents a TCP (or other logical) stream a higher level in the stack
is identifying to be in progress, and the driver plants that in
skb->mark before calling into the stack.

I guess this could yield nice speed up for the GRO stack -- matching
based on single 32 bit value instead of per protocol (eth, vlan, ip,
tcp) checks [1] - or hint which packets from the current window of
"ready" completion descriptor could be grouped together for upper
processing?

Or.

[1] some details to complete (...) here, on the last protocol hop we
do need to verify that it would be correct to stick the incoming
packet to the existing pending packet of this stream

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ