netdev - Re: [RFC] GRO scalability

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1349463634.21172.152.camel@edumazet-glaptop>
Date:	Fri, 05 Oct 2012 21:00:34 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Rick Jones <rick.jones2@...com>
Cc:	Herbert Xu <herbert@...dor.apana.org.au>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>, Jesse Gross <jesse@...ira.com>
Subject: Re: [RFC] GRO scalability

On Fri, 2012-10-05 at 11:16 -0700, Rick Jones wrote:
> O
> Flushing things if N packets have come though sounds like goodness, and 
> it reminds me a bit about what happens with IP fragment reassembly - 
> another area where the stack is trying to guess just how long to 
> hang-onto a packet before doing something else with it.  But the value 
> of N to get a "decent" per-flow GRO aggregation rate will depend on the 
> number of concurrent flows right?  If I want to have a good shot at 
> getting 2 segments combined for 1000 active, concurrent flows entering 
> my system via that interface, won't N have to approach 2000?
> 

It all depends on the max latency you can afford.

> GRO (and HW LRO) has a fundamental limitation/disadvantage here.  GRO 
> does provide a very nice "boost" on various situations (especially 
> numbers of concurrent netperfs that don't blow-out the tracking limits) 
> but since it won't really know anything about the flow(s) involved (*) 
> or even their number (?), it will always be guessing.  That is why it is 
> really only "poor man's JumboFrames" (or larger MTU - Sadly, the IEEE 
> keeps us all beggars here).
> 
> A goodly portion of the benefit of GRO comes from the "incidental" ACK 
> avoidance it causes yes?  That being the case, might that be a 
> worthwhile avenue to explore?   It would then naturally scale as TCP et 
> al do today.
> 
> When we go to 40 GbE will we have 4x as many flows, or the same number 
> of 4x faster flows?
> 
> rick jones
> 
> * for example - does this TCP segment contain the last byte(s) of a 
> pipelined http request/response and the first byte(s) of the next one 
> and so should "flush" now?

Some remarks :

1) I use some 40Gbe links, thats probably why I try to improve things ;)

2) benefit of GRO can be huge, and not only for the ACK avoidance
   (other tricks could be done for ACK avoidance in the stack)

3) High speeds probably need multiqueue device, and each queue has its
own GRO unit.

  For example on a 40Gbe, 8 queues -> 5Gbps per queue (about 400k
packets/sec)

Lets say we allow no more than 1ms of delay in GRO, this means we could
have about 400 packets in the GRO queue (assuming 1500 bytes packets)

Another idea to play with would be to extend GRO to allow packet
reorder.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html