lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 16 Oct 2015 10:14:47 +0200 From: Cristian RUIZ <cristian.ruiz@...ia.fr> To: Neal Cardwell <ncardwell@...gle.com>, Eric Dumazet <edumazet@...gle.com>, "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org CC: lucas Nussbaum <lucas.nussbaum@...ia.fr>, Emmanuel Jeanvoine <emmanuel.jeanvoine@...ia.fr> Subject: Performance regression with netns+linux-bridge+veth since Linux 4.0 Hello, When evaluating the performance degradation of running HPC applications inside network namespaces I found that all Linux kernel versions since 4.0 introduce a high overhead for some applications. I compare the execution time of a benchmark (details below) both in a "native" execution (without netns), with the execution inside netns. For Linux 3.16, the overhead measured is 42%. For Linux 3.19, it is 8%. For Linux 4.0 and later, the overhead measured jumps to 300%. Using git bisect, I tracked this down to commit 9949afa4 ("tcp: fix tcp_cong_avoid_ai() credit accumulation bug with decreases in w"). Reverting commit 9949afa4 on top of Linux 4.2 restores the correct (Linux 3.19) behavior. However, I don't understand why commit 9949afa4 causes such issues. Any ideas? The network setup I used is the following: M1 M2 ------------------------ ------------------------- | | | | | --------- | | --------- | | | veth0 | <-> veth1 | | veth1 <-> | veth0 | | ... | --------- | | | | --------- | | netns br0 | | br0 netns | | | | | | | -----------------eth0--- ----eth0----------------- | | | | ====== switch ====== Which was created using the following commands: brctl addbr br0 ip addr add dev br0 192.168.60.101/24 # <- machine ip address ip link set dev br0 up brctl addif br0 eth1 ifconfig eth1 0.0.0.0 up ip link add name veth1 type veth peer name veth0 ip link set veth1 up brctl addif br0 veth1 ip netns add vnode ip link set dev int0 netns vnode ip netns exec vnode ip addr add 10.144.0.3/24 dev int0 # I used a different network to communicate among network namespaces. ip netns exec vnode ip link set dev veth0 up ifconfig br0:1 10.144.0.252/24 # I assigned an IP address to the bridge inside the network attributed to the network namespaces. All machines used in my experiments have this configuration. The performance issue appears when I run a parallel application inside network namespaces over 8 physical machines using the configuration mentioned above. The parallel applications used during the tests belong to the NAS benchmarks[1] and the most affected application is CG class B. The application runs 16 CPU intensive processes per machine (1 process per core) that exchange around 1 GByte of data per execution. (The mapping of processes per physical machines is the same for both native and netns executions, of course.) Below, the output of some network counters inside netns using the command 'netstat -s': 98 resets received for embryonic SYN_RECV sockets 2344 TCP sockets finished time wait in fast timer 986 delayed acks sent 2 delayed acks further delayed because of locked socket Quick ack mode was activated 34279 times 75 packets directly queued to recvmsg prequeue. 2313551 packet headers predicted 781506 acknowledgments not containing data payload received 1739746 predicted acknowledgments 22228 times recovered from packet loss by selective acknowledgements Detected reordering 130 times using FACK Detected reordering 498 times using SACK Detected reordering 68 times using time stamp 2157 congestion windows fully recovered without slow start 1465 congestion windows partially recovered using Hoe heuristic 2273 congestion windows recovered without slow start by DSACK 140 congestion windows recovered without slow start after partial ack 59007 fast retransmits 53276 forward retransmits 140 other TCP timeouts TCPLossProbes: 140 Things to remark here is the high number of congestion windows recoveries and the number of TCP timeouts. For an execution with Linux kernel 3.19 the number of congestion windows is 3 times lower and there are few TCP timeouts (around 3) than on 4.0. All this is reflected into the maximum network speed achieved by the application, 239MB/s (4.0) against 552MB/s (3.19). More details about the configuration used ========================================= Hardware used ------------- All machines are equipped with two CPUs 8 cores each, I attached the output of '/proc/cpuinfo'. Memory: 128 GB Network: 10 Gigabit Ethernet (eth0) Driver: ixgbe Storage: 2*600GB HDD / SAS Driver: ahci User-space software -------------------- Debian jessie, OpenMPI version 1.8.5[2], NAS benchmark version 3.3 Linux kernel ------------ The .config files used are attached in this mail. [1] https://www.nas.nasa.gov/publications/npb.html [2] http://www.open-mpi.org/software/ompi/v1.8/ View attachment "config-4.0.0" of type "text/plain" (163657 bytes) View attachment "config-3.19.0" of type "text/plain" (161886 bytes) View attachment "cpuinfo" of type "text/plain" (16460 bytes)
Powered by blists - more mailing lists