lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <56B3A2D4.1010309@hpe.com>
Date:	Thu, 4 Feb 2016 11:13:24 -0800
From:	Rick Jones <rick.jones2@....com>
To:	netdev@...r.kernel.org
Subject: Disabling XPS for 4.4.0-1+ixgbe+OpenStack VM over a VLAN means 65%
 increase in netperf TCP_STREAM

Folks -

I was doing some performance work with OpenStack Liberty on systems with 
2x E5-2650L v3 @ 1.80GHz processors and 560FLR (Intel 82599ES) NICs onto 
which I'd placed a 4.4.0-1 kernel.  I was actually interested in the 
effect of removing the linux bridge from all the plumbing OpenStack 
creates (it is there for iptables-based implementation of security group 
rules because OS Liberty doesn't enable them on the OVS bridge(s) it 
creates), and I'd noticed that when I removed the linux bridge from the 
"stack" instance-to-instance (vm-to-vm) performance across a VLAN-based 
Neutron private network dropped.  Quite unexpected.

On a lark, I tried explicitly binding the NIC's IRQs and Boom! the 
single-stream performance shot-up to near link-rate.  I couldn't recall 
explicit binding of IRQs doing that much for single-stream netperf 
TCP_STREAM before.

I asked the Intel folks about that, they suggested I try disabling XPS. 
  So, with that I see the following on single-stream tests between the 
VMs on that VLAN-based private network as created by OpenStack Liberty:


	   99% Confident within +/- 2.5% of "real" average
		TCP_RR in Trans/s TCP_STREAM in Mbit/s

                    XPS Enabled   XPS Disabled   Delta
TCP_STREAM            5353	    8841 (*)    65.2%
TCP_RR                8562          9666        12.9%

The Intel folks suggested something about the process scheduler moving 
the sender around and ultimately causing some packet re-ordering.  That 
could I suppose explain the TCP_STREAM difference, but not the TCP_RR 
since that has just a single segment in flight at one time.

I can try to get perf/whatnot installed on the systems - suggestions as 
to what metrics to look at are welcome.

happy benchmarking,

rick jones
* If I disable XPS on the sending side only, it is more like 7700 Mbit/s

netstats from the receiver over a netperf TCP_STREAM test's duration 
with XPS enabled:

$ netperf -H 10.240.50.191 -- -o throughput,local_transport_retrans
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
10.240.50.191 () port 0 AF_INET : demo
Throughput,Local Transport Retransmissions
5292.74,4555


$ ./beforeafter before after
Ip:
     327837 total packets received
     0 with invalid addresses
     0 forwarded
     0 incoming packets discarded
     327837 incoming packets delivered
     293438 requests sent out
Icmp:
     0 ICMP messages received
     0 input ICMP message failed.
     ICMP input histogram:
         destination unreachable: 0
     0 ICMP messages sent
     0 ICMP messages failed
     ICMP output histogram:
         destination unreachable: 0
IcmpMsg:
         InType3: 0
         OutType3: 0
Tcp:
     0 active connections openings
     2 passive connection openings
     0 failed connection attempts
     0 connection resets received
     0 connections established
     327837 segments received
     293438 segments send out
     0 segments retransmited
     0 bad segments received.
     0 resets sent
Udp:
     0 packets received
     0 packets to unknown port received.
     0 packet receive errors
     0 packets sent
     IgnoredMulti: 0
UdpLite:
TcpExt:
     0 TCP sockets finished time wait in fast timer
     0 delayed acks sent
     Quick ack mode was activated 1016 times
     50386 packets directly queued to recvmsg prequeue.
     309545872 bytes directly in process context from backlog
     2874395424 bytes directly received in process context from prequeue
     86591 packet headers predicted
     84934 packets header predicted and directly queued to user
     6 acknowledgments not containing data payload received
     20 predicted acknowledgments
     1017 DSACKs sent for old packets
     TCPRcvCoalesce: 157097
     TCPOFOQueue: 78206
     TCPOrigDataSent: 24
IpExt:
     InBcastPkts: 0
     InOctets: 6643231012
     OutOctets: 17203936
     InBcastOctets: 0
     InNoECTPkts: 327837

And now with it disabled on both sides:
$ netperf -H 10.240.50.191 -- -o throughput,local_transport_retrans
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
10.240.50.191 () port 0 AF_INET : demo
Throughput,Local Transport Retransmissions
8656.84,1903
$ ./beforeafter noxps_before noxps_avter
Ip:
     251831 total packets received
     0 with invalid addresses
     0 forwarded
     0 incoming packets discarded
     251831 incoming packets delivered
     218415 requests sent out
Icmp:
     0 ICMP messages received
     0 input ICMP message failed.
     ICMP input histogram:
         destination unreachable: 0
     0 ICMP messages sent
     0 ICMP messages failed
     ICMP output histogram:
         destination unreachable: 0
IcmpMsg:
         InType3: 0
         OutType3: 0
Tcp:
     0 active connections openings
     2 passive connection openings
     0 failed connection attempts
     0 connection resets received
     0 connections established
     251831 segments received
     218415 segments send out
     0 segments retransmited
     0 bad segments received.
     0 resets sent
Udp:
     0 packets received
     0 packets to unknown port received.
     0 packet receive errors
     0 packets sent
     IgnoredMulti: 0
UdpLite:
TcpExt:
     0 TCP sockets finished time wait in fast timer
     0 delayed acks sent
     Quick ack mode was activated 48 times
     91752 packets directly queued to recvmsg prequeue.
     846851580 bytes directly in process context from backlog
     5442436572 bytes directly received in process context from prequeue
     102517 packet headers predicted
     146102 packets header predicted and directly queued to user
     6 acknowledgments not containing data payload received
     26 predicted acknowledgments
     TCPLossProbes: 0
     TCPLossProbeRecovery: 0
     48 DSACKs sent for old packets
     0 DSACKs received
     TCPRcvCoalesce: 45658
     TCPOFOQueue: 967
     TCPOrigDataSent: 30
IpExt:
     InBcastPkts: 0
     InOctets: 10837972268
     OutOctets: 11413100
     InBcastOctets: 0
     InNoECTPkts: 251831

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ