lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52299EDD.1030208@openvpn.net>
Date:	Fri, 06 Sep 2013 03:22:37 -0600
From:	James Yonan <james@...nvpn.net>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	netdev <netdev@...r.kernel.org>
Subject: Re: GSO/GRO and UDP performance

On 04/09/2013 05:53, Eric Dumazet wrote:
> On Wed, 2013-09-04 at 04:07 -0600, James Yonan wrote:
>
>> The bundle of UDP packets would traverse the stack as a unit until it
>> reaches the socket layer, where recvmmsg could pass the whole bundle up
>> to userspace in a single transaction (or recvmsg could disaggregate the
>> bundle and pass each datagram individually).
>
> That would require a lot of work, say in netfilter, but also in core
> network stack in forwarding, and all UDP users (L2TP, vxlan).
>
> Very unlikely to happen IMHO.

I agree that aggregating packets by chaining multiple packets into a 
single skb would be too disruptive.

However I believe GSO/GRO provides a potential solution here that would 
be transparent to the core network stack and existing in-kernel UDP users.

GSO/GRO already allows any L4 protocol or lower to define their own 
segmentation and aggregation algorithms, as long as the algorithms are 
lossless.

There's no reason why GSO/GRO couldn't operate on L5 or higher protocols 
if segmentation and aggregation algorithms are provided by a kernel 
module that understands the specific app protocol.

It looks like this could be done with minimal changes to the GSO/GRO 
core.  There would need to be a hook where a kernel module could 
register itself as a GSO/GRO provider for UDP.  It could then perform 
segmentation/aggregation on UDP packets that belong to it.  The dispatch 
to the UDP GSO/GRO providers would be done by the existing offload code 
for UDP, so there would be zero added overhead for non-UDP protocols.

>
> I suspect the performance is coming from aggregation done in user space,
> then re-injected into the kernel ?
>
> You could use a kernel module, using udp_encap_enable() and friends.
>
> Check vxlan_socket_create() for an example

I actually put together a test kernel module using udp_encap_enable to 
see if I could accelerate UDP performance that way.  But even with the 
boost of running in kernel space, the packet processing overhead of 
dealing with 1500 byte packets negates most of the gain, while TCP gets 
a 43x performance boost by being able to aggregate up to 64KB per 
superpacket with GSO/GRO.

So I think that playing well with GSO/GRO is essential to get speedup in 
UDP apps because of this 43x multiplier.

James
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ