[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5643.1249061383@death.nxdomain.ibm.com>
Date: Fri, 31 Jul 2009 10:29:43 -0700
From: Jay Vosburgh <fubar@...ibm.com>
To: Sumedha Gupta <2sumedha@...il.com>
cc: netdev@...r.kernel.org
Subject: Re: Bandwidth of NIC bonding/trunking
Sumedha Gupta <2sumedha@...il.com> wrote:
>I configured bonding/trunking on a Netgear GS724TR switch. I am using
>a 4-port NIC card using Intel Corporation 82571EB ethernet controller
>on machine X with all four ports connected to the switch. There are
>four client machines (A, B, C, D) also connected to the switch.
>iperf is running on machine X as a server and machines A, B, C, D are
>running as iperf clients:
A lot of this is explained in the bonding.txt documentation that
comes with the kernel source; there's probably more detail there on some
of this.
>With round robin, I consistently get:
> Interval Transfer Bandwidth
> 0.0-10.0 sec 992 MBytes 831 Mbits/sec
> 0.0-10.0 sec 989 MBytes 828 Mbits/sec
> 0.0-10.0 sec 1019 MBytes 854 Mbits/sec
> 0.0-10.0 sec 965 MBytes 808 Mbits/sec
The danger of round-robin (balance-rr mode) is that it generally
delivers packets out of order. This will irritate TCP's congestion
control (which can be moderated via the net.ipv6.tcp_reordering sysctl),
causing reduced throughput and inefficient use of the media. Also, UDP
or other protocol users must be able to tolerate out of order delivery.
>However with XOR, I am not getting enough bandwidth:
> 0.0-10.0 sec 619 MBytes 519 Mbits/sec
> 0.0-10.0 sec 398 MBytes 333 Mbits/sec
> 0.0-10.0 sec 338 MBytes 283 Mbits/sec
> 0.0-10.0 sec 612 MBytes 513 Mbits/sec
The xor (balance-xor mode) selects the slave to use according to
a hash. There are several hash algorithms available, the best is
generally the layer3+4, but, again, it's just math, and with a small
number of destinations relative to the number of slaves, you're fairly
likely to get traffic doubled up on the slaves. The available hashes
are described in detail in the bonding.txt file.
On the positive side, the balance-xor mode will not deliver
packets out of order.
>Similar results with transmit load balancing (tlb), 802.3ad, active
>backup and broadcast:
> 0.0-10.5 sec 421 MBytes 336 Mbits/sec
> 0.0-10.1 sec 323 MBytes 269 Mbits/sec
> 0.0-10.0 sec 301 MBytes 252 Mbits/sec
> 0.0-10.0 sec 405 MBytes 339 Mbits/sec
The balance-tlb mode does some semi-intelligent selection of the
transmitting slave according to the load. However, all reply traffic
comes in to one slave. Nothing is reordered.
The 802.3ad mode uses the same selection algorithms as
balance-xor, so again, you can select the best transmit hash policy.
Don't use broadcast. It just sends everything to every slave,
and I've never really figured out a rational use for it. Somebody,
somewhere, probably has some obscure use for it.
>Although, with adaptive load balancing I was constantly getting
>perfect bandwidth:
> 0.0-10.0 sec 1.09 GBytes 937 Mbits/sec
> 0.0-10.0 sec 1.09 GBytes 937 Mbits/sec
> 0.0-10.0 sec 1.09 GBytes 937 Mbits/sec
> 0.0-10.0 sec 1.09 GBytes 937 Mbits/sec
The balance-alb mode does the same thing as balance-tlb for
transmit, but it also uses tailored ARP messages to direct slaves to
respond to particular slaves. In your case, each peer (there are four)
is effectively assigned its own slave interface, so there's no
contention for the bandwidth.
Note that for -alb (and -tlb) the balancing is done according to
MAC address, so any hosts beyond a router will all be balanced together.
>I wanted to know if so much change with mode change in bonding is
>expected or did I configure something wrong in the switch which is
>causing xor, tlb, 802.3ad etc. to not work properly?
Variation is expected depending upon the workload; if one mode
was perfect for every configuration there wouldn't be so many.
Another factor for all modes is the switch's balancing of
traffic when sending back to the bond. For the balance-tlb and -alb
modes, the switch isn't involved. However, for the balance-rr,
balance-xor or 802.3ad modes, the switch has its own algorithm, usually
a hash of some kind, to direct packets to a particular port of the
etherchannel group (for -rr and -xor modes) or aggregator (for 802.3ad
mode, perhaps called LACP on the switch).
If the switch isn't configured for Etherchannel or 802.3ad /
LACP when running the equivalent bonding modes, then the return traffic
from the switch will not be balanced correctly.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists