[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DDC23D8.80905@gmail.com>
Date: Tue, 24 May 2011 23:32:08 +0200
From: Nicolas de Pesloüan
<nicolas.2p.debian@...il.com>
To: Jay Vosburgh <fubar@...ibm.com>
CC: Neil Horman <nhorman@...driver.com>,
Andy Gospodarek <andy@...yhouse.net>, netdev@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
Le 24/05/2011 23:12, Jay Vosburgh a écrit :
> Nicolas de Pesloüan<nicolas.2p.debian@...il.com> wrote:
>
>> Le 24/05/2011 22:37, Neil Horman a écrit :
>>
>>>>>> + return -EINVAL;
>>>>
>>>> This will turn a warning into an error.
>>>>
>>> Yes, because it should have been an error all along.
>>>
>>>> This warning existed for long, but never caused the bonding setup to
>>>> fail. This patch cause some regression for user space. For example,
>>>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
>>>> before enslaving, because this was never required.
>>>>
>>> Thats not a regression, thats the kernel returning an error where it should have
>>> done so all along. Just because a utility got away with it for awhile and it
>>> didn't always cause a lockup, doesn't grandfather that application in to a
>>> situation where the kernel has to support its broken behavior in perpituity.
>>>
>>> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
>>> patch doesn't touch, so ifenslave is currently unaffected (although I should
>>> look in the ioctl path to see if we have already added such a check, lest you be
>>> able to deadlock your system as previously indicated using that tool).
>>
>> Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use
>> ioctl (ifenslave binary) anymore, but only sysfs.
>>
>> Documentation/bonding.txt should be updated to reflect this change.
>> pr_warning should be changed to pr_ err.
>> Bonding version should be bumped.
>>
>> Anyway, I will fix this package, but I suspect there exist many user
>> scripts that don't ensure bond is up before enslaving.
>
> I looked at sysconfig (as supplied with opensuse) and it uses
> sysfs, and does set the master device up first. The other potential
> user that comes to mind is that OFED at one point had a script to set up
> bonding for Infiniband devices. I don't know if this is still the case,
> nor do I know if it set the bond device up before enslaving.
>
> Generally speaking, though, in the long run I think it should be
> permissible to change any bonding option when the bond is down (even to
> values that make no sense in context, e.g., setting the primary to a
> device not currently enslaved). My rationale here is that some options
> are very difficult to modify when the bond is up (e.g., changing the
> mode), and now some other set is precluded when the bond is down. The
> init scripts already have repeat logic in them; this just makes things
> more complicated.
>
> There should be a state wherein any option can be changed (well,
> maybe not max_bonds), and that should be the down state. A subset can
> also be changed while up. I'd be happy to be able to change all options
> while the bond is up, too, but that seems pretty hard to do.
>
> How much harder is it to fix the locking and permit the action
> in question here?
To add to the comment by Jay, here is an extract from the more up to date version of the script that
setup bonding on Debian (/etc/network/if-pre-up.d/ifenslave, from ifenslave-2.6 package):
# early_setup_master is the place where we do master setup that need to be done before enslavement.
early_setup_master()
{
# Warning: the order in wich we write into the sysfs files is important.
# Double check in drivers/net/bonding/bond_sysfs.c in linux kernel source tree
# before changing anything here.
# fail_over_mac must be set before enslavement of any slaves.
sysfs fail_over_mac "$IF_BOND_FAIL_OVER_MAC"
}
setup_master()
{
# Warning: the order in wich we write into the sysfs files is important.
# Double check in drivers/net/bonding/bond_sysfs.c in linux kernel source tree
# before changing anything here.
# use_carrier can be set anytime.
sysfs use_carrier "$IF_BOND_USE_CARRIER"
# num_grat_arp can be set anytime.
sysfs num_grat_arp "$IF_BOND_NUM_GRAT_ARP"
# num_unsol_na can be set anytime.
sysfs num_unsol_na "$IF_BOND_NUM_UNSOL_NA"
# xmit_hash_policy can be set anytime.
# Changing xmit_hash_policy requires $BOND_MASTER to be down.
sysfs_change_down xmit_hash_policy "$IF_BOND_XMIT_HASH_POLICY"
# arp_ip_target must be set before arp_interval.
sysfs_add arp_ip_target "$IF_BOND_ARP_IP_TARGET"
sysfs arp_interval "$IF_BOND_ARP_INTERVAL"
# miimon must be set before updelay and downdelay.
sysfs miimon "$IF_BOND_MIIMON"
sysfs downdelay "$IF_BOND_DOWNDELAY"
sysfs updelay "$IF_BOND_UPDELAY"
# Changing ad_select requires $BOND_MASTER to be down.
sysfs_change_down ad_select "$IF_BOND_AD_SELECT"
# Changing mode requires $BOND_MASTER to be down.
# Mode should be set after miimon or arp_interval, to avoid a warning in syslog.
sysfs_change_down mode "$IF_BOND_MODE"
# arp_validate must be after mode (because mode must be active-backup).
sysfs arp_validate "$IF_BOND_ARP_VALIDATE"
# lacp_rate must be set after mode (because mode must be 802.3ad).
# Changing lacp_rate requires $BOND_MASTER to be down.
sysfs_change_down lacp_rate "$IF_BOND_LACP_RATE"
# primary must be set after mode (because only supported in some modes) and after enslavement.
# The first slave in bond-primary found in current slaves becomes the primary.
# If no slave in bond-primary is found, then primary does not change.
for slave in $IF_BOND_PRIMARY ; do
if grep -sq "\\<$slave\\>" "/sys/class/net/$BOND_MASTER/bonding/slaves" ; then
sysfs primary "$slave"
break
fi
done
# primary_reselect should be set after mode (because only supported in some modes), after
enslavement
# and after primary. This is currently (2.6.35-rc1) not enforced by the bonding driver, but
it is
# probably safer to do it in that order.
sysfs primary_reselect "$IF_BOND_PRIMARY_RESELECT"
# queue_id must be set after enslavement.
for iface_queue_id in $IF_BOND_QUEUE_ID
do
sysfs iface_queue_id $iface_queue_id
done
# active_slave must be set after mode and after enslavement.
# The slave must be up and the underlying link must be up too.
# FIXME: We should have a way to write an empty string to active_slave, to set the
active_slave to none.
if [ "$IF_BOND_ACTIVE_SLAVE" ] ; then
# Need to force interface up before. Bonding will refuse to activate a down interface.
ip link set "$IF_BOND_ACTIVE_SLAVE" up
sysfs active_slave "$IF_BOND_ACTIVE_SLAVE"
fi
# Force $BOND_MASTER to be up, if we are called from a slave stanza.
[ "$IFACE" != "$BOND_MASTER" ] && ip link set dev "$BOND_MASTER" up
}
In order to fix some of the reported problems with some previous versions of this script, I had to
conduct a detailed code review of bond_sysfs.c and eventually end up with this current version,
where I documented all the constraints I discovered.
bonding sysfs currently enforce many constraints on the exact order of the setup. As noted by Jay,
some setup need to be done while the master if down, some while the master is up, some need to be
done before the first enslavement, some must be done after the referenced slave is enslaved, some
are only allowed if mode is properly set before, and so on...
I though for long about adding this into Documentation/networking/bonding.txt, but fail to find the
time to do so.
Arguably, it would be better to spend time to remove those constraints than to document them.
Nicolas.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists