[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <506F23F6.1060704@hp.com>
Date: Fri, 05 Oct 2012 11:16:22 -0700
From: Rick Jones <rick.jones2@...com>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: Herbert Xu <herbert@...dor.apana.org.au>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>, Jesse Gross <jesse@...ira.com>
Subject: Re: [RFC] GRO scalability
On 10/05/2012 07:52 AM, Eric Dumazet wrote:
> What we could do :
>
> 1) Use a hash to avoid expensive gro_list management and allow
> much more concurrent flows.
>
> Use skb_get_rxhash(skb) to compute rxhash
>
> If l4_rxhash not set -> not a GRO candidate.
>
> If l4_rxhash set, use a hash lookup to immediately finds a 'same flow'
> candidates.
>
> (tcp stack could eventually use rxhash instead of its custom hash
> computation ...)
>
> 2) Use a LRU list to eventually be able to 'flush' too old packets,
> even if the napi never completes. Each time we process a new packet,
> being a GRO candidate or not, we increment a napi->sequence, and we
> flush the oldest packet in gro_lru_list if its own sequence is too
> old.
>
> That would give a latency guarantee.
Flushing things if N packets have come though sounds like goodness, and
it reminds me a bit about what happens with IP fragment reassembly -
another area where the stack is trying to guess just how long to
hang-onto a packet before doing something else with it. But the value
of N to get a "decent" per-flow GRO aggregation rate will depend on the
number of concurrent flows right? If I want to have a good shot at
getting 2 segments combined for 1000 active, concurrent flows entering
my system via that interface, won't N have to approach 2000?
GRO (and HW LRO) has a fundamental limitation/disadvantage here. GRO
does provide a very nice "boost" on various situations (especially
numbers of concurrent netperfs that don't blow-out the tracking limits)
but since it won't really know anything about the flow(s) involved (*)
or even their number (?), it will always be guessing. That is why it is
really only "poor man's JumboFrames" (or larger MTU - Sadly, the IEEE
keeps us all beggars here).
A goodly portion of the benefit of GRO comes from the "incidental" ACK
avoidance it causes yes? That being the case, might that be a
worthwhile avenue to explore? It would then naturally scale as TCP et
al do today.
When we go to 40 GbE will we have 4x as many flows, or the same number
of 4x faster flows?
rick jones
* for example - does this TCP segment contain the last byte(s) of a
pipelined http request/response and the first byte(s) of the next one
and so should "flush" now?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists