netdev - Re: [RFC] GRO scalability

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 05 Oct 2012 12:35:43 -0700
From:	Rick Jones <rick.jones2@...com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Herbert Xu <herbert@...dor.apana.org.au>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>, Jesse Gross <jesse@...ira.com>
Subject: Re: [RFC] GRO scalability

On 10/05/2012 12:00 PM, Eric Dumazet wrote:
> On Fri, 2012-10-05 at 11:16 -0700, Rick Jones wrote:
>
> Some remarks :
>
> 1) I use some 40Gbe links, thats probably why I try to improve things ;)

Path length before workarounds :)

> 2) benefit of GRO can be huge, and not only for the ACK avoidance
>     (other tricks could be done for ACK avoidance in the stack)

Just how much code path is there between NAPI and the socket?? (And I 
guess just how much combining are you hoping for?)

> 3) High speeds probably need multiqueue device, and each queue has its
> own GRO unit.
>
>    For example on a 40Gbe, 8 queues -> 5Gbps per queue (about 400k
> packets/sec)
>
> Lets say we allow no more than 1ms of delay in GRO,

OK.  That means we can ignore HPC and FSI because they wouldn't tolerate 
that kind of added delay anyway.  I'm not sure if that also then 
eliminates the networked storage types.

> this means we could have about 400 packets in the GRO queue (assuming
> 1500 bytes packets)

How many flows are you going to have entering via that queue?  And just 
how well "shuffled" will the segments of those flows be?  That is what 
it all comes down to right?  How many (active) flows and how well 
shuffled they are.  If the flows aren't well shuffled, you can get away 
with a smallish coalescing context.  If they are perfectly shuffled and 
greater in number than your delay allowance you get right back to square 
with all the overhead of GRO attempts with none of the benefit.

If the flow count is < 400 to allow a decent shot at a non-zero 
combining rate on well shuffled flows with the 400 packet limit, then 
that means each flow is >= 12.5 Mbit/s on average at 5 Gbit/s 
aggregated.  And I think you then get two segments per flow aggregated 
at a time.  Is that consistent with what you expect to be the 
characteristics of the flows entering via that queue?

rick jones
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html