netdev - Re: Bandwidth of NIC bonding/trunking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 31 Jul 2009 10:29:43 -0700
From:	Jay Vosburgh <fubar@...ibm.com>
To:	Sumedha Gupta <2sumedha@...il.com>
cc:	netdev@...r.kernel.org
Subject: Re: Bandwidth of NIC bonding/trunking

Sumedha Gupta <2sumedha@...il.com> wrote:

>I configured bonding/trunking on a Netgear GS724TR switch. I am using
>a 4-port NIC card using Intel Corporation 82571EB ethernet controller
>on machine X with all four ports connected to the switch. There are
>four client machines (A, B, C, D) also connected to the switch.
>iperf is running on machine X as a server and machines A, B, C, D are
>running as iperf clients:

	A lot of this is explained in the bonding.txt documentation that
comes with the kernel source; there's probably more detail there on some
of this.

>With round robin, I consistently get:
> Interval            Transfer         Bandwidth
> 0.0-10.0 sec    992 MBytes    831 Mbits/sec
> 0.0-10.0 sec    989 MBytes    828 Mbits/sec
> 0.0-10.0 sec  1019 MBytes    854 Mbits/sec
> 0.0-10.0 sec    965 MBytes    808 Mbits/sec

	The danger of round-robin (balance-rr mode) is that it generally
delivers packets out of order.  This will irritate TCP's congestion
control (which can be moderated via the net.ipv6.tcp_reordering sysctl),
causing reduced throughput and inefficient use of the media.  Also, UDP
or other protocol users must be able to tolerate out of order delivery.

>However with XOR, I am not getting enough bandwidth:
> 0.0-10.0 sec    619 MBytes    519 Mbits/sec
> 0.0-10.0 sec    398 MBytes    333 Mbits/sec
> 0.0-10.0 sec    338 MBytes    283 Mbits/sec
> 0.0-10.0 sec    612 MBytes    513 Mbits/sec

	The xor (balance-xor mode) selects the slave to use according to
a hash.  There are several hash algorithms available, the best is
generally the layer3+4, but, again, it's just math, and with a small
number of destinations relative to the number of slaves, you're fairly
likely to get traffic doubled up on the slaves.  The available hashes
are described in detail in the bonding.txt file.

	On the positive side, the balance-xor mode will not deliver
packets out of order.

>Similar results with transmit load balancing (tlb), 802.3ad, active
>backup and broadcast:
> 0.0-10.5 sec    421 MBytes    336 Mbits/sec
> 0.0-10.1 sec    323 MBytes    269 Mbits/sec
> 0.0-10.0 sec    301 MBytes    252 Mbits/sec
> 0.0-10.0 sec    405 MBytes    339 Mbits/sec

	The balance-tlb mode does some semi-intelligent selection of the
transmitting slave according to the load.  However, all reply traffic
comes in to one slave.  Nothing is reordered.

	The 802.3ad mode uses the same selection algorithms as
balance-xor, so again, you can select the best transmit hash policy.

	Don't use broadcast.  It just sends everything to every slave,
and I've never really figured out a rational use for it.  Somebody,
somewhere, probably has some obscure use for it.

>Although, with adaptive load balancing I was constantly getting
>perfect bandwidth:
> 0.0-10.0 sec  1.09 GBytes    937 Mbits/sec
> 0.0-10.0 sec  1.09 GBytes    937 Mbits/sec
> 0.0-10.0 sec  1.09 GBytes    937 Mbits/sec
> 0.0-10.0 sec  1.09 GBytes    937 Mbits/sec

	The balance-alb mode does the same thing as balance-tlb for
transmit, but it also uses tailored ARP messages to direct slaves to
respond to particular slaves.  In your case, each peer (there are four)
is effectively assigned its own slave interface, so there's no
contention for the bandwidth.

	Note that for -alb (and -tlb) the balancing is done according to
MAC address, so any hosts beyond a router will all be balanced together.

>I wanted to know if so much change with mode change in bonding is
>expected or did I configure something wrong in the switch which is
>causing xor, tlb, 802.3ad etc. to not work properly?

	Variation is expected depending upon the workload; if one mode
was perfect for every configuration there wouldn't be so many.

	Another factor for all modes is the switch's balancing of
traffic when sending back to the bond.  For the balance-tlb and -alb
modes, the switch isn't involved.  However, for the balance-rr,
balance-xor or 802.3ad modes, the switch has its own algorithm, usually
a hash of some kind, to direct packets to a particular port of the
etherchannel group (for -rr and -xor modes) or aggregator (for 802.3ad
mode, perhaps called LACP on the switch).

	If the switch isn't configured for Etherchannel or 802.3ad /
LACP when running the equivalent bonding modes, then the return traffic
from the switch will not be balanced correctly.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html