netdev - [RFC PATCH 0/5] IPsec parallelization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081201071614.GP476@secunet.com>
Date:	Mon, 1 Dec 2008 08:16:14 +0100
From:	Steffen Klassert <steffen.klassert@...unet.com>
To:	netdev@...r.kernel.org
Cc:	davem@...emloft.net, herbert@...dor.apana.org.au,
	klassert@...hematik.tu-chemnitz.de
Subject: [RFC PATCH 0/5] IPsec parallelization

This is a first throw to try to parallelize the expensive part of xfrm by
using a generic parallelization/serialization method. This method uses the
remote softirq invocation infrastructure for parallelization and serialization.
With this method data objects can be processed in parallel, starting 
at some given point. After doing some expensive operations in parallel, 
it is possible to serialize again. The parallelized data objects return after
serialization in the order as they were before the parallelization. 
In the case of xfrm, this makes it possible to run the expensive part in
parallel without getting packet reordering.

To use this parallelization method for xfrm, some changes in the crypto system
were necessary. First of all, we need to force disabling async crypto transforms
in the parallelization case, because we can't guarantee the packet order if
the packets are put to a queue during the parallel processing.
A second thing was a very high contended lock in crypto_authenc_hash() if
the crypto system runs in parallel. To get rid of this, the struct aead is
moved to percpu data, what in turn means that we have percpu IV chains now.
However, I'm not that familiar with the crypto system. So I'm not sure whether
this is acceptable as I did it, this needs review.

I did forwarding tests with two quad core machines (Intel Core 2 Quad Q6600) 
used as IPsec routers (xfrm tunnel between the two quad core machines) and two
notebooks T61 used as traffic generators.
With this testing environment I'm geting a throughput up to 910 Mbit/s (ipv4)
and 880 Mbit/s (ipv6) with aes192-sha1 encryption (measured with iperf,
_one_ tcp stream). Without the parallelization I'm getting with the same
environment about 340 Mbit/s (ipv4) and 320 Mbit/s (ipv6).

If somebody wants to test it, the parallelization is switched off by default.
To enable it, do 'echo 1 > /proc/sys/net/core/xfrm_padata'.

Steffen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html