[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALx6S34H67o64q3YoYiHQ+VSSuSsuSjBXmexSSXF_Hq8fcN0iw@mail.gmail.com>
Date: Wed, 20 Jan 2016 15:27:38 -0800
From: Tom Herbert <tom@...bertland.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Or Gerlitz <gerlitz.or@...il.com>,
David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Linux Netdev List <netdev@...r.kernel.org>,
Alexander Duyck <alexander.duyck@...il.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Daniel Borkmann <borkmann@...earbox.net>,
Marek Majkowski <marek@...udflare.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
Florian Westphal <fw@...len.de>,
Paolo Abeni <pabeni@...hat.com>,
John Fastabend <john.r.fastabend@...el.com>,
Amir Vadai <amirva@...il.com>
Subject: Re: Optimizing instruction-cache, more packets at each stage
On Wed, Jan 20, 2016 at 3:02 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Thu, 2016-01-21 at 00:20 +0200, Or Gerlitz wrote:
>
>> Dave, I assume you refer to the RSS hash result which is written by
>> NIC HWs to the completion descriptor and then fed to the stack by the
>> driver calling skb_set_hash(.)? Well, this can be taken even further.
>>
>> Suppose a the NIC can be programmed by the kernel to provide a unique
>> flow tag on the completion descriptor per a given 5/12 tuple which
>> represents a TCP (or other logical) stream a higher level in the stack
>> is identifying to be in progress, and the driver plants that in
>> skb->mark before calling into the stack.
>>
>> I guess this could yield nice speed up for the GRO stack -- matching
>> based on single 32 bit value instead of per protocol (eth, vlan, ip,
>> tcp) checks [1] - or hint which packets from the current window of
>> "ready" completion descriptor could be grouped together for upper
>> processing?
>
> We already use the RSS hash (skb->hash) in GRO engine to speedup the
> parsing : If skb->hash differs, then there is no point trying to
> aggregate two packets.
>
> Note that if we had a l4 hash for all provided packets, GRO could use a
> hash table instead of one single list of skbs.
>
Besides that, GRO requires parsing the packet anyway so I don't see
much value in trying to optimize GRO by using the hash.
Unfortunately, the hardware hash from devices hasn't really lived up
to its potential. The original intent of getting the hash from device
was to be able to do packet steering (RPS and RFS) without touching
the header. But this never was implemented. eth_type_trans touches
headers and GRO is best when done before steering. Given the
weaknesses of Toeplitz we talked about recently and that fact that
Jenkins is really fast to compute, I am starting to think maybe we
should always do a software hash and not rely on HW for it...
>
>
Powered by blists - more mailing lists