netdev - Re: [RFC] GRO scalability

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <506F23F6.1060704@hp.com>
Date:	Fri, 05 Oct 2012 11:16:22 -0700
From:	Rick Jones <rick.jones2@...com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Herbert Xu <herbert@...dor.apana.org.au>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>, Jesse Gross <jesse@...ira.com>
Subject: Re: [RFC] GRO scalability

On 10/05/2012 07:52 AM, Eric Dumazet wrote:
> What we could do :
>
> 1) Use a hash to avoid expensive gro_list management and allow
>     much more concurrent flows.
>
> Use skb_get_rxhash(skb) to compute rxhash
>
> If l4_rxhash not set -> not a GRO candidate.
>
> If l4_rxhash set, use a hash lookup to immediately finds a 'same flow'
> candidates.
>
> (tcp stack could eventually use rxhash instead of its custom hash
> computation ...)
>
> 2) Use a LRU list to eventually be able to 'flush' too old packets,
>     even if the napi never completes. Each time we process a new packet,
>     being a GRO candidate or not, we increment a napi->sequence, and we
>     flush the oldest packet in gro_lru_list if its own sequence is too
>     old.
>
>    That would give a latency guarantee.

Flushing things if N packets have come though sounds like goodness, and 
it reminds me a bit about what happens with IP fragment reassembly - 
another area where the stack is trying to guess just how long to 
hang-onto a packet before doing something else with it.  But the value 
of N to get a "decent" per-flow GRO aggregation rate will depend on the 
number of concurrent flows right?  If I want to have a good shot at 
getting 2 segments combined for 1000 active, concurrent flows entering 
my system via that interface, won't N have to approach 2000?

GRO (and HW LRO) has a fundamental limitation/disadvantage here.  GRO 
does provide a very nice "boost" on various situations (especially 
numbers of concurrent netperfs that don't blow-out the tracking limits) 
but since it won't really know anything about the flow(s) involved (*) 
or even their number (?), it will always be guessing.  That is why it is 
really only "poor man's JumboFrames" (or larger MTU - Sadly, the IEEE 
keeps us all beggars here).

A goodly portion of the benefit of GRO comes from the "incidental" ACK 
avoidance it causes yes?  That being the case, might that be a 
worthwhile avenue to explore?   It would then naturally scale as TCP et 
al do today.

When we go to 40 GbE will we have 4x as many flows, or the same number 
of 4x faster flows?

rick jones

* for example - does this TCP segment contain the last byte(s) of a 
pipelined http request/response and the first byte(s) of the next one 
and so should "flush" now?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html