lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 19 Jan 2011 19:13:59 +0300
From:	"Oleg V. Ukhno" <olegu@...dex-team.ru>
To:	Jay Vosburgh <fubar@...ibm.com>
CC:	Nicolas de Pesloüan 
	<nicolas.2p.debian@...il.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	"David S. Miller" <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Sébastien Barré <sebastien.barre@...ouvain.be>,
	Christophe Paasch <christoph.paasch@...ouvain.be>
Subject: Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for
 single TCP session balancing

On 01/18/2011 11:24 PM, Jay Vosburgh wrote:
> 	I haven't done much testing with this lately, but I suspect this
> behavior hasn't really changed.  Raising the tcp_reordering sysctl value
> can mitigate this somewhat (by making TCP more tolerant of this), but
> that doesn't help non-TCP protocols.
>
> 	Barring evidence to the contrary, I presume that Oleg's system
> delivers out of order at the receiver.  That's not automatically a
> reason to reject it, but this entire proposal is sufficiently complex to
> configure that very explicit documentation will be necessary.
>
> 	-J
>
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
>

Jay,
I have ran some tests with patched 802.3ad bonding for now
Test system configuration:
2 identical servers with 82576, Gigabit ET2 Quad Port Srvr Adptr 
LowProfile, PCI-E (igb), connected to one switch(Cisco 2960) with all 4 
ports, all ports on each host aggregated into single etherchannel using 
802.3ad(w/patch).
kernel version: vanilla 2.6.32(tcp_reordering - default setting)
igb version - 2.3.4, parameters - default
Ran two tests:
1) unidirectional test using iperf
2) Bidirectional test, iperf client is running with 8 threads
One remark:
Decreasing number of slaves will cause higher active slave utilization( 
for example with 2 slaves iperf test will consume almost full bandwidth 
available in both directions(test parameters are the same, test time 
reduced to 150sec):
[SUM]  0.0-150.3 sec  34640 MBytes  1933 Mbits/sec
[SUM]  0.0-150.5 sec  34875 MBytes  1944 Mbits/sec
)
For me (my use case) risk of some bandwidth loss with 4 slaves is 
acceptable, but my suggestion that building aggregate link with more 
than 4 slaves is inadequate. For 2 slaves this solution should work with 
minimum @overhead@ of any kind. TCP reordering and retransmit numbers in 
my opinion are acceptable for most use cases for such bonding mode I can 
imagine.

What is your opinion on my idea with patch?

I will come back with results for VLAN tunneling case, if this is 
necessary (Nicolas, shall I do that test - I think it will show similar 
results for performance?)

Below are test results(sorry for huge amount of text):

Iperf results:
Test 1:
Receiver:
[root@...get2 ~]# iperf -f m -c 192.168.111.128 -B 192.168.111.129 -p 
9999 -t 300
------------------------------------------------------------
Client connecting to 192.168.111.128, TCP port 9999
Binding to local address 192.168.111.129
TCP window size: 32.0 MByte (default)
------------------------------------------------------------
[  3] local 192.168.111.129 port 9999 connected with 192.168.111.128 
port 9999
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-300.0 sec  141643 MBytes  3961 Mbits/sec
Sender:
[root@...get1 ~]# iperf -f m -s -B 192.168.111.128 -p 9999 -t 300
------------------------------------------------------------
Server listening on TCP port 9999
Binding to local address 192.168.111.128
TCP window size: 32.0 MByte (default)
------------------------------------------------------------
[  4] local 192.168.111.128 port 9999 connected with 192.168.111.129 
port 9999
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-300.1 sec  141643 MBytes  3959 Mbits/sec
^C[root@...get1 ~]#

Test 2:
former "sender" side:
[SUM]  0.0-300.2 sec  111541 MBytes  3117 Mbits/sec
[SUM]  0.0-300.4 sec  110515 MBytes  3086 Mbits/sec
former "receiver" side:
[SUM]  0.0-300.1 sec  110515 MBytes  3089 Mbits/sec
[SUM]  0.0-300.3 sec  111541 MBytes  3116 Mbits/sec




Netstat's:

netstat -st (sender, before 1st test)
[root@...get1 ~]# netstat -st
IcmpMsg:
     InType3: 5
     InType8: 3
     OutType0: 3
     OutType3: 4
Tcp:
     26 active connections openings
     7 passive connection openings
     5 failed connection attempts
     1 connection resets received
     4 connections established
     349 segments received
     330 segments send out
     7 segments retransmited
     0 bad segments received.
     5 resets sent
UdpLite:
TcpExt:
     10 TCP sockets finished time wait in slow timer
     8 delayed acks sent
     56 packets directly queued to recvmsg prequeue.
     40 packets directly received from backlog
     317 packets directly received from prequeue
     78 packets header predicted
     36 packets header predicted and directly queued to user
     41 acknowledgments not containing data received
     134 predicted acknowledgments
     0 TCP data loss events
     4 other TCP timeouts
     2 connections reset due to unexpected data
     TCPSackShiftFallback: 1
IpExt:
     InMcastPkts: 74
     OutMcastPkts: 62
     InOctets: 76001
     OutOctets: 82234
     InMcastOctets: 13074
     OutMcastOctets: 10428

netstat -st (sender, after 1st test)
[root@...get1 ~]netstat -st
IcmpMsg:
     InType3: 5
     InType8: 7
     OutType0: 7
     OutType3: 4
Tcp:
     71 active connections openings
     15 passive connection openings
     5 failed connection attempts
     4 connection resets received
     4 connections established
     16674161 segments received
     16674113 segments send out
     7 segments retransmited
     0 bad segments received.
     5 resets sent
UdpLite:
TcpExt:
     31 TCP sockets finished time wait in slow timer
     13 delayed acks sent
     42 delayed acks further delayed because of locked socket
     Quick ack mode was activated 297 times
     239 packets directly queued to recvmsg prequeue.
     2388220516 packets directly received from backlog
     595165 packets directly received from prequeue
     16954 packets header predicted
     445 packets header predicted and directly queued to user
     129 acknowledgments not containing data received
     322 predicted acknowledgments
     0 TCP data loss events
     4 other TCP timeouts
     297 DSACKs sent for old packets
     2 connections reset due to unexpected data
     TCPSackShiftFallback: 1
IpExt:
     InMcastPkts: 86
     OutMcastPkts: 68
     InBcastPkts: 2
     InOctets: -930738047
     OutOctets: 1321936884
     InMcastOctets: 13434
     OutMcastOctets: 10620
     InBcastOctets: 483

netstat -st (receiver, before 1st test)
[root@...get2 ~]# netstat -st
IcmpMsg:
     InType3: 5
     InType8: 3
     OutType0: 3
     OutType3: 4
Tcp:
     23 active connections openings
     6 passive connection openings
     3 failed connection attempts
     1 connection resets received
     3 connections established
     309 segments received
     264 segments send out
     7 segments retransmited
     0 bad segments received.
     6 resets sent
UdpLite:
TcpExt:
     10 TCP sockets finished time wait in slow timer
     5 delayed acks sent
     74 packets directly queued to recvmsg prequeue.
     16 packets directly received from backlog
     377 packets directly received from prequeue
     62 packets header predicted
     35 packets header predicted and directly queued to user
     32 acknowledgments not containing data received
     106 predicted acknowledgments
     0 TCP data loss events
     4 other TCP timeouts
     1 connections reset due to early user close
IpExt:
     InMcastPkts: 75
     OutMcastPkts: 62
     InOctets: 64952
     OutOctets: 66396
     InMcastOctets: 13428
     OutMcastOctets: 10403

netstat -st (sender, after 1st test)
[root@...get2 ~]# netstat -st
IcmpMsg:
     InType3: 5
     InType8: 8
     OutType0: 8
     OutType3: 4
Tcp:
     70 active connections openings
     14 passive connection openings
     3 failed connection attempts
     4 connection resets received
     4 connections established
     16674253 segments received
     16673801 segments send out
     487 segments retransmited
     0 bad segments received.
     6 resets sent
UdpLite:
TcpExt:
     32 TCP sockets finished time wait in slow timer
     15 delayed acks sent
     228 packets directly queued to recvmsg prequeue.
     24 packets directly received from backlog
     1081 packets directly received from prequeue
     146 packets header predicted
     124 packets header predicted and directly queued to user
     10913589 acknowledgments not containing data received
     573 predicted acknowledgments
     185 times recovered from packet loss due to SACK data
     Detected reordering 1 times using FACK
     Detected reordering 8 times using SACK
     Detected reordering 2 times using time stamp
     1 congestion windows fully recovered
     23 congestion windows partially recovered using Hoe heuristic
     TCPDSACKUndo: 1
     0 TCP data loss events
     471 fast retransmits
     9 forward retransmits
     4 other TCP timeouts
     297 DSACKs received
     1 connections reset due to early user close
     TCPDSACKIgnoredOld: 258
     TCPDSACKIgnoredNoUndo: 39
     TCPSackShiftFallback: 35790574
IpExt:
     InMcastPkts: 89
     OutMcastPkts: 69
     InBcastPkts: 2
     InOctets: 1321825004
     OutOctets: -928982419
     InMcastOctets: 13848
     OutMcastOctets: 10627
     InBcastOctets: 483

Second test:

former "sender" side:
[root@...get1 ~]# netstat -st
IcmpMsg:
     InType3: 5
     InType8: 13
     OutType0: 13
     OutType3: 4
Tcp:
     556 active connections openings
     65 passive connection openings
     391 failed connection attempts
     15 connection resets received
     4 connections established
     52164640 segments received
     52117884 segments send out
     62522 segments retransmited
     0 bad segments received.
     33 resets sent
UdpLite:
TcpExt:
     27 invalid SYN cookies received
     74 TCP sockets finished time wait in slow timer
     698540 packets rejects in established connections because of timestamp
     51 delayed acks sent
     487 delayed acks further delayed because of locked socket
     Quick ack mode was activated 18838 times
     7 times the listen queue of a socket overflowed
     7 SYNs to LISTEN sockets ignored
     1632 packets directly queued to recvmsg prequeue.
     4137769996 packets directly received from backlog
     5723253 packets directly received from prequeue
     1365131 packets header predicted
     136330 packets header predicted and directly queued to user
     10241415 acknowledgments not containing data received
     156502 predicted acknowledgments
     10983 times recovered from packet loss due to SACK data
     Detected reordering 4 times using FACK
     Detected reordering 10095 times using SACK
     Detected reordering 138 times using time stamp
     2107 congestion windows fully recovered
     18612 congestion windows partially recovered using Hoe heuristic
     TCPDSACKUndo: 80
     5 congestion windows recovered after partial ack
     0 TCP data loss events
     52 timeouts after SACK recovery
     2 timeouts in loss state
     61206 fast retransmits
     7 forward retransmits
     984 retransmits in slow start
     8 other TCP timeouts
     258 sack retransmits failed
     18838 DSACKs sent for old packets
     274 DSACKs sent for out of order packets
     14169 DSACKs received
     34 DSACKs for out of order packets received
     2 connections reset due to unexpected data
     TCPDSACKIgnoredOld: 8694
     TCPDSACKIgnoredNoUndo: 5482
     TCPSackShiftFallback: 18352494
IpExt:
     InMcastPkts: 104
     OutMcastPkts: 77
     InBcastPkts: 6
     InOctets: -474718903
     OutOctets: 1280495238
     InMcastOctets: 13974
     OutMcastOctets: 10908
     InBcastOctets: 1449



former "receiver" side:
[root@...get2 ~]# netstat -st
IcmpMsg:
     InType3: 5
     InType8: 14
     OutType0: 14
     OutType3: 4
Tcp:
     182 active connections openings
     39 passive connection openings
     4 failed connection attempts
     12 connection resets received
     4 connections established
     52098089 segments received
     52180386 segments send out
     68994 segments retransmited
     0 bad segments received.
     1070 resets sent
UdpLite:
TcpExt:
     12 TCP sockets finished time wait in fast timer
     102 TCP sockets finished time wait in slow timer
     770084 packets rejects in established connections because of timestamp
     37 delayed acks sent
     261 delayed acks further delayed because of locked socket
     Quick ack mode was activated 14276 times
     1466 packets directly queued to recvmsg prequeue.
     1190723332 packets directly received from backlog
     4781569 packets directly received from prequeue
     776470 packets header predicted
     97281 packets header predicted and directly queued to user
     24979561 acknowledgments not containing data received
     484206 predicted acknowledgments
     11461 times recovered from packet loss due to SACK data
     Detected reordering 15 times using FACK
     Detected reordering 15520 times using SACK
     Detected reordering 208 times using time stamp
     2046 congestion windows fully recovered
     18402 congestion windows partially recovered using Hoe heuristic
     TCPDSACKUndo: 82
     13 congestion windows recovered after partial ack
     0 TCP data loss events
     49 timeouts after SACK recovery
     1 timeouts in loss state
     62078 fast retransmits
     5340 forward retransmits
     1181 retransmits in slow start
     20 other TCP timeouts
     322 sack retransmits failed
     14276 DSACKs sent for old packets
     36 DSACKs sent for out of order packets
     17940 DSACKs received
     254 DSACKs for out of order packets received
     4 connections reset due to early user close
     TCPDSACKIgnoredOld: 12703
     TCPDSACKIgnoredNoUndo: 5251
     TCPSackShiftFallback: 57141117
IpExt:
     InMcastPkts: 104
     OutMcastPkts: 76
     InBcastPkts: 6
     InOctets: 902997645
     OutOctets: -82887048
     InMcastOctets: 14296
     OutMcastOctets: 10851
     InBcastOctets: 1449
[root@...get2 ~]#








-- 
Best regards,
Oleg Ukhno.
ITO Team Lead,
Yandex LLC.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ