netdev - Re: duplicate arp request problem with bonding driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9a2da8520905052019t5a2c07f2i2799722825c43f86@mail.gmail.com>
Date:	Wed, 6 May 2009 08:49:20 +0530
From:	Deepjyoti Kakati <dkakati73@...il.com>
To:	netdev@...r.kernel.org
Subject: Re: duplicate arp request problem with bonding driver

thanks for your detailed reply Jay.

the idea with my topology was to have a bandwidth of 2 Gbps
(2xeth ports) available at all times to redundant switches.

                         bond3 (primary-backup)
                        /       \
                     bond1  bond2 (aggregated)
                      !    !     !    !
                     e1 e2   e2 e4


but as you pointed out the primary->backup failover wont occur
until all slaves have failed apart from the issue with nesting of
bonds.

no particular reason why I selected balance-rr mode, it just was
the default. my switches being cisco catalysts, I will try out your
suggestion about the single level bonding and 802.3ad mode.


-Deepjyoti

On Tue, May 5, 2009 at 8:23 PM, Jay Vosburgh <fubar@...ibm.com> wrote:

>
>        Your problem is ultimately because nesting of bonds does not
> quite work.  Bonding does not really process incoming packets in the
> usual sense; there's just a minimal check to assign the packet to the
> proper device, and a check to see if it should be dropped.  This check
> is done one time only, for the bottommost level of bond (the one to
> which the receiving device is enslaved to).  For the VLAN case, the VLAN
> input processing will immediately thereafter assign the packet to the
> VLAN device.
>
>        My question for you is: what do you want to achieve with this
> type of topology?  There are going to be a few issues with it, other
> than what you've found already.  For one, the active-backup won't fail
> over until all slaves of the active bond (bond1 or bond2) have failed.
>
>        I'm not sure what your workload is, but I would rarely recommend
> using balance-rr for anything (it often leads to out of order packet
> delivery).  I would hazard to guess that you're trying to optimize a
> single TCP/IP stream's throughput, and balance-rr will do that to a
> degree, at the expense of overall throughput.
>
>        In your case, if the interconnects between switch1 and switch2
> and the final destinations are not all faster than the eth1 - eth4
> devices, you ultimately won't have any increase in throughput, because
> the switch uplinks will be limiting the available bandwidth.  If the
> interconnects are all faster, and the final destinations are running
> etherchannel or the like, you'll see out of order delivery.  Your single
> stream may see roughly a 50% increase in throughput for TCP/IP (for two
> slaves of the same speed, i.e., 1.5 Gb/sec for two 1 Gb/sec slaves) but
> at a lower efficiency.
>
>        My usual recommendation for this type of configuration is to use
> 802.3ad mode, particularly with a recent kernel with changes to 802.3ad
> to support the recent ad_select option:
>
> ad_select
>
>        Specifies the 802.3ad aggregation selection logic to use.  The
>        possible values and their effects are:
>
>        stable or 0
>
>                The active aggregator is chosen by largest aggregate
>                bandwidth.
>
>                Reselection of the active aggregator occurs only when all
>                slaves of the active aggregator are down or the active
>                aggregator has no slaves.
>
>                This is the default value.
>
>        bandwidth or 1
>
>                The active aggregator is chosen by largest aggregate
>                bandwidth.  Reselection occurs if:
>
>                - A slave is added to or removed from the bond
>
>                - Any slave's link state changes
>
>                - Any slave's 802.3ad association state changes
>
>                - The bond's adminstrative state changes to up
>
>        count or 2
>
>                The active aggregator is chosen by the largest number of
>                ports (slaves).  Reselection occurs as described under the
>                "bandwidth" setting, above.
>
>        The bandwidth and count selection policies permit failover of
>        802.3ad aggregations when partial failure of the active aggregator
>        occurs.  This keeps the aggregator with the highest availability
>        (either in bandwidth or in number of ports) active at all times.
>
>        This option was added in bonding version 3.4.0.
>
>
>        Setting this option to "bandwidth" or "count" permits 802.3ad to
> be configured for "gang failover," e.g.:
>
>                        bond0
>        eth0    eth1    eth2    eth3
>        !       !       !       !
>        [switch 1]      [switch 2]
>
>
>        In this case, (eth0, eth1) and (eth2, eth3) will be put together
> into aggregators, and one will be selected automatically to be the
> active aggregator.  Should either slave of the active aggregator fail,
> the ad_select policy will cause a reselection of the active aggregator.
> This, in the end, should keep the "best" aggregator active at all times.
>
>        This does require the switches to support 802.3ad, but that's
> fairly common these days.  802.3ad does not support a round-robin type
> of transmit policy, so if you are heavily dependent on that, this may
> not work for you.
>
>        I don't follow Fedora development, so I don't know if their
> current kernels support this option or not.
>
>        -J
>
> ---
>        -Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html