netdev - Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 19 Jan 2011 01:22:12 +0300
From:	"Oleg V. Ukhno" <olegu@...dex-team.ru>
To:	Jay Vosburgh <fubar@...ibm.com>
CC:	Nicolas de Pesloüan 
	<nicolas.2p.debian@...il.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	"David S. Miller" <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Sébastien Barré <sebastien.barre@...ouvain.be>,
	Christophe Paasch <christoph.paasch@...ouvain.be>
Subject: Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for
 single TCP session balancing



Jay Vosburgh wrote:
> 
> 	One item I'd like to see some more data on is the level of
> reordering at the receiver in Oleg's system.
> 
> 	One of the reasons round robin isn't as useful as it once was is
> due to the rise of NAPI and interrupt coalescing, both of which will
> tend to increase the reordering of packets at the receiver when the
> packets are evenly striped.  In the old days, it was one interrupt, one
> packet.  Now, it's one interrupt or NAPI poll, many packets.  With the
> packets striped across interfaces, this will tend to increase
> reordering.  E.g.,
> 
> 	slave 1		slave 2		slave 3
> 	Packet 1	P2		P3
> 	P4		P5		P6
> 	P7		P8		P9
> 
> 	and so on.  A poll of slave 1 will get packets 1, 4 and 7 (and
> probably several more), then a poll of slave 2 will get 2, 5 and 8, etc.
> 
> 	I haven't done much testing with this lately, but I suspect this
> behavior hasn't really changed.  Raising the tcp_reordering sysctl value
> can mitigate this somewhat (by making TCP more tolerant of this), but
> that doesn't help non-TCP protocols.
> 
> 	Barring evidence to the contrary, I presume that Oleg's system
> delivers out of order at the receiver.  That's not automatically a
> reason to reject it, but this entire proposal is sufficiently complex to
> configure that very explicit documentation will be necessary.

Jay, here is some network stats from one of my iSCSI targets with avg 
load of 1.5-2.5Gbit/sec(4 slaves in etherchannel).Not perfect and not 
very "clean"(there are more interfaces on host, than these 4)
[root@<somehost> ~]# netstat -st 

IcmpMsg: 

     InType0: 6 

     InType3: 1872 

     InType8: 60557 

     InType11: 23 

     OutType0: 60528 

     OutType3: 1755 

     OutType8: 6 

Tcp: 

     1298909 active connections openings 

     61090 passive connection openings 

     2374 failed connection attempts 

     62781 connection resets received 

     3 connections established 

     1268233942 segments received 

     1198020318 segments send out 

     18939618 segments retransmited 

     0 bad segments received. 

     23643 resets sent 

TcpExt:
     294935 TCP sockets finished time wait in fast timer
     472 time wait sockets recycled by time stamp
     819481 delayed acks sent
     295332 delayed acks further delayed because of locked socket
     Quick ack mode was activated 30616377 times
     3516920 packets directly queued to recvmsg prequeue.
     4353 packets directly received from backlog
     44873453 packets directly received from prequeue
     1442812750 packets header predicted
     1077442 packets header predicted and directly queued to user
     2123453975 acknowledgments not containing data received
     2375328274 predicted acknowledgments
     8462439 times recovered from packet loss due to fast retransmit
     Detected reordering 19203 times using reno fast retransmit
     Detected reordering 100 times using time stamp
     3429 congestion windows fully recovered
     11760 congestion windows partially recovered using Hoe heuristic
     398 congestion windows recovered after partial ack
     0 TCP data loss events
     3671 timeouts after reno fast retransmit
     6 timeouts in loss state
     18919118 fast retransmits
     11637 retransmits in slow start
     1756 other TCP timeouts
     TCPRenoRecoveryFail: 3187
     62779 connections reset due to early user close
IpExt:
     InBcastPkts: 512616
[root@<somehost> ~]# uptime
  00:35:49 up 42 days,  8:27,  1 user,  load average: 3.70, 3.80, 4.07
[root@<somehost> ~]# sysctl -a|grep tcp_reo
net.ipv4.tcp_reordering = 3

I will get back with "clean" results after I'll setup test system tomorrow.
TcpExt stats from other hosts are similar.

> 
> 	-J
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
> 

-- 
Best regards,
Oleg Ukhno
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html