lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 12 Aug 2013 16:01:42 +0300
From:	Timo Teras <timo.teras@....fi>
To:	netdev@...r.kernel.org
Subject: ipsec smp scalability and cpu use fairness (softirqs)

Hi,

I've been recently doing some ipsec benchmarking, and analysis on
system running out of cpu power. The setup is dmvpn gateway
(gre+xfrm+opennhrp) with traffic in forward path.

The system I have been using are VIA Nano (Padlock aes/sha accel) and
Intel Xeon (aes-ni and ssse3 sha1) based. In both setups the crypto
happens synchronously using special opcodes, or assembly implementation
of the algorithm.

It seems that the combination of softirq, napi and synchronous crypto
causes two problems.

1. Single core systems that are going out of cpu power, are
overwhelmed in uncontrollable manner. As softirq is doing the heavy
lifting, the user land processes are starved first. This can cause
userland IKE daemon to starve and lose tunnels when it is unable to
answer liveliness checks. The quick workaround is to setup traffic
shaping for the encrypted traffic.

2. On multicore (6-12 cores) systems, it would appear that it is not
easy to distribute the ipsec to multiple cores. as softirq is sticky to
the cpu where it was raised. The ipsec decryption/encryption is done
synchronously in the napi poll loop, and the throughput is limited by
one cpu. If the NIC supports multiple queues and balancing with ESP
SPI, we can use that to get some parallelism.

Fundamentally, both problems arise because synchronous crypto happens in
the softirq context. I'm wondering if it would make sense to execute
the synchronous crypto in low-priority per-xfrm_state workqueue or
similar.

Any suggestions or comments?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ