[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9a2da8520905052019t5a2c07f2i2799722825c43f86@mail.gmail.com>
Date: Wed, 6 May 2009 08:49:20 +0530
From: Deepjyoti Kakati <dkakati73@...il.com>
To: netdev@...r.kernel.org
Subject: Re: duplicate arp request problem with bonding driver
thanks for your detailed reply Jay.
the idea with my topology was to have a bandwidth of 2 Gbps
(2xeth ports) available at all times to redundant switches.
bond3 (primary-backup)
/ \
bond1 bond2 (aggregated)
! ! ! !
e1 e2 e2 e4
but as you pointed out the primary->backup failover wont occur
until all slaves have failed apart from the issue with nesting of
bonds.
no particular reason why I selected balance-rr mode, it just was
the default. my switches being cisco catalysts, I will try out your
suggestion about the single level bonding and 802.3ad mode.
-Deepjyoti
On Tue, May 5, 2009 at 8:23 PM, Jay Vosburgh <fubar@...ibm.com> wrote:
>
> Your problem is ultimately because nesting of bonds does not
> quite work. Bonding does not really process incoming packets in the
> usual sense; there's just a minimal check to assign the packet to the
> proper device, and a check to see if it should be dropped. This check
> is done one time only, for the bottommost level of bond (the one to
> which the receiving device is enslaved to). For the VLAN case, the VLAN
> input processing will immediately thereafter assign the packet to the
> VLAN device.
>
> My question for you is: what do you want to achieve with this
> type of topology? There are going to be a few issues with it, other
> than what you've found already. For one, the active-backup won't fail
> over until all slaves of the active bond (bond1 or bond2) have failed.
>
> I'm not sure what your workload is, but I would rarely recommend
> using balance-rr for anything (it often leads to out of order packet
> delivery). I would hazard to guess that you're trying to optimize a
> single TCP/IP stream's throughput, and balance-rr will do that to a
> degree, at the expense of overall throughput.
>
> In your case, if the interconnects between switch1 and switch2
> and the final destinations are not all faster than the eth1 - eth4
> devices, you ultimately won't have any increase in throughput, because
> the switch uplinks will be limiting the available bandwidth. If the
> interconnects are all faster, and the final destinations are running
> etherchannel or the like, you'll see out of order delivery. Your single
> stream may see roughly a 50% increase in throughput for TCP/IP (for two
> slaves of the same speed, i.e., 1.5 Gb/sec for two 1 Gb/sec slaves) but
> at a lower efficiency.
>
> My usual recommendation for this type of configuration is to use
> 802.3ad mode, particularly with a recent kernel with changes to 802.3ad
> to support the recent ad_select option:
>
> ad_select
>
> Specifies the 802.3ad aggregation selection logic to use. The
> possible values and their effects are:
>
> stable or 0
>
> The active aggregator is chosen by largest aggregate
> bandwidth.
>
> Reselection of the active aggregator occurs only when all
> slaves of the active aggregator are down or the active
> aggregator has no slaves.
>
> This is the default value.
>
> bandwidth or 1
>
> The active aggregator is chosen by largest aggregate
> bandwidth. Reselection occurs if:
>
> - A slave is added to or removed from the bond
>
> - Any slave's link state changes
>
> - Any slave's 802.3ad association state changes
>
> - The bond's adminstrative state changes to up
>
> count or 2
>
> The active aggregator is chosen by the largest number of
> ports (slaves). Reselection occurs as described under the
> "bandwidth" setting, above.
>
> The bandwidth and count selection policies permit failover of
> 802.3ad aggregations when partial failure of the active aggregator
> occurs. This keeps the aggregator with the highest availability
> (either in bandwidth or in number of ports) active at all times.
>
> This option was added in bonding version 3.4.0.
>
>
> Setting this option to "bandwidth" or "count" permits 802.3ad to
> be configured for "gang failover," e.g.:
>
> bond0
> eth0 eth1 eth2 eth3
> ! ! ! !
> [switch 1] [switch 2]
>
>
> In this case, (eth0, eth1) and (eth2, eth3) will be put together
> into aggregators, and one will be selected automatically to be the
> active aggregator. Should either slave of the active aggregator fail,
> the ad_select policy will cause a reselection of the active aggregator.
> This, in the end, should keep the "best" aggregator active at all times.
>
> This does require the switches to support 802.3ad, but that's
> fairly common these days. 802.3ad does not support a round-robin type
> of transmit policy, so if you are heavily dependent on that, this may
> not work for you.
>
> I don't follow Fedora development, so I don't know if their
> current kernels support this option or not.
>
> -J
>
> ---
> -Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists