[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071214222700.GB8515@gospo.usersys.redhat.com>
Date: Fri, 14 Dec 2007 17:27:01 -0500
From: Andy Gospodarek <andy@...yhouse.net>
To: Krzysztof Oledzki <olel@....pl>
Cc: Andy Gospodarek <andy@...yhouse.net>,
Jay Vosburgh <fubar@...ibm.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
Andrew Morton <akpm@...ux-foundation.org>,
bugme-daemon@...zilla.kernel.org, shemminger@...ux-foundation.org,
davem@...emloft.net, netdev@...r.kernel.org
Subject: Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)
On Fri, Dec 14, 2007 at 11:11:15PM +0100, Krzysztof Oledzki wrote:
>
>
> On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>
> >On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:
> >>
> >>
> >>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
> >>
> >>>On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:
> >>>>
> >>>>
> >>>>On Wed, 12 Dec 2007, Jay Vosburgh wrote:
> >>>>
> >>>>>Herbert Xu <herbert@...dor.apana.org.au> wrote:
> >>>>>
> >>>>>>>diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
> >>>>>>>drivers/net/bonding/bond_sysfs.c
> >>>>>>>--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
> >>>>>>>+++ a/drivers/net/bonding/bond_sysfs.c
> >>>>>>>@@ -1111,8 +1111,6 @@ static ssize_t bonding_store_primary(str
> >>>>>>>out:
> >>>>>>> write_unlock_bh(&bond->lock);
> >>>>>>>
> >>>>>>>- rtnl_unlock();
> >>>>>>>-
> >>>>>>
> >>>>>>Looking at the changeset that added this perhaps the intention
> >>>>>>is to hold the lock? If so we should add an rtnl_lock to the start
> >>>>>>of the function.
> >>>>>
> >>>>> Yes, this function needs to hold locks, and more than just
> >>>>>what's there now. I believe the following should be correct; I haven't
> >>>>>tested it, though (I'm supposedly on vacation right now).
> >>>>>
> >>>>> The following change should be correct for the
> >>>>>bonding_store_primary case discussed in this thread, and also corrects
> >>>>>the bonding_store_active case which performs similar functions.
> >>>>>
> >>>>> The bond_change_active_slave and bond_select_active_slave
> >>>>>functions both require rtnl, bond->lock for read and curr_slave_lock
> >>>>>for
> >>>>>write_bh, and no other locks. This is so that the lower level
> >>>>>mode-specific functions can release locks down to just rtnl in order to
> >>>>>call, e.g., dev_set_mac_address with the locks it expects (rtnl only).
> >>>>>
> >>>>>Signed-off-by: Jay Vosburgh <fubar@...ibm.com>
> >>>>>
> >>>>>diff --git a/drivers/net/bonding/bond_sysfs.c
> >>>>>b/drivers/net/bonding/bond_sysfs.c
> >>>>>index 11b76b3..28a2d80 100644
> >>>>>--- a/drivers/net/bonding/bond_sysfs.c
> >>>>>+++ b/drivers/net/bonding/bond_sysfs.c
> >>>>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct
> >>>>>device
> >>>>>*d,
> >>>>> struct slave *slave;
> >>>>> struct bonding *bond = to_bond(d);
> >>>>>
> >>>>>- write_lock_bh(&bond->lock);
> >>>>>+ rtnl_lock();
> >>>>>+ read_lock(&bond->lock);
> >>>>>+ write_lock_bh(&bond->curr_slave_lock);
> >F
> >>>>>+
> >>>>> if (!USES_PRIMARY(bond->params.mode)) {
> >>>>> printk(KERN_INFO DRV_NAME
> >>>>> ": %s: Unable to set primary slave; %s is in mode
> >>>>> %d\n",
> >>>>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct
> >>>>>device
> >>>>>*d,
> >>>>> }
> >>>>> }
> >>>>>out:
> >>>>>- write_unlock_bh(&bond->lock);
> >>>>>-
> >>>>>+ write_unlock_bh(&bond->curr_slave_lock);
> >>>>>+ read_unlock(&bond->lock);
> >>>>> rtnl_unlock();
> >>>>>
> >>>>> return count;
> >>>>>@@ -1190,7 +1193,8 @@ static ssize_t bonding_store_active_slave(struct
> >>>>>device *d,
> >>>>> struct bonding *bond = to_bond(d);
> >>>>>
> >>>>> rtnl_lock();
> >>>>>- write_lock_bh(&bond->lock);
> >>>>>+ read_lock(&bond->lock);
> >>>>>+ write_lock_bh(&bond->curr_slave_lock);
> >>>>>
> >>>>> if (!USES_PRIMARY(bond->params.mode)) {
> >>>>> printk(KERN_INFO DRV_NAME
> >>>>>@@ -1247,7 +1251,8 @@ static ssize_t bonding_store_active_slave(struct
> >>>>>device *d,
> >>>>> }
> >>>>> }
> >>>>>out:
> >>>>>- write_unlock_bh(&bond->lock);
> >>>>>+ write_unlock_bh(&bond->curr_slave_lock);
> >>>>>+ read_unlock(&bond->lock);
> >>>>> rtnl_unlock();
> >>>>>
> >>>>> return count;
> >>>>
> >>>>Vanilla 2.6.24-rc5 plus this patch:
> >>>>
> >>>>=========================================================
> >>>>[ INFO: possible irq lock inversion dependency detected ]
> >>>>2.6.24-rc5 #1
> >>>>---------------------------------------------------------
> >>>>events/0/9 just changed the state of lock:
> >>>>(&mc->mca_lock){-+..}, at: [<c0411c7a>] mld_ifc_timer_expire+0x130/0x1fb
> >>>>but this lock took another, soft-read-irq-unsafe lock in the past:
> >>>>(&bond->lock){-.--}
> >>>>
> >>>>and interrupts could create inverse lock ordering between them.
> >>>>
> >>>>
> >>>
> >>>Grrr, I should have seen that -- sorry. Try your luck with this instead:
> >><CUT>
> >>
> >>No luck.
> >>
> >>bonding: bond0: setting mode to active-backup (1).
> >>bonding: bond0: Setting MII monitoring interval to 100.
> >>ADDRCONF(NETDEV_UP): bond0: link is not ready
> >>bonding: bond0: Adding slave eth0.
> >>e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow
> >>Control: RX/TX
> >>bonding: bond0: making interface eth0 the new active one.
> >>bonding: bond0: first active interface up!
> >>bonding: bond0: enslaving eth0 as an active interface with an up link.
> >>bonding: bond0: Adding slave eth1.
> >>ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> >
> ><SNIP>
> >
> >>bonding: bond0: enslaving eth1 as a backup interface with a down link.
> >>bonding: bond0: Setting eth0 as primary slave.
> >>bond0: no IPv6 routers present
> >>
> >
> >Based on the console log, I'm guessing your initialization scripts use
> >sysfs to set eth0 as the primary interface for bond0? Can you confirm?
>
> Yep, that's correct:
>
> postup() {
> if [[ ${IFACE} == "bond0" ]] ; then
> echo -n +eth0 > /sys/class/net/${IFACE}/bonding/slaves
> echo -n +eth1 > /sys/class/net/${IFACE}/bonding/slaves
> echo -n eth0 > /sys/class/net/${IFACE}/bonding/primary
> fi
> }
>
Good. Thanks for the confirmation.
> >If you did somehow use sysfs to set the primary device as eth0, I'm
> >guessing you never see this issue without that line or without this
> >patch. Please confirm this as well.
>
> Without this patch I get another error, of course. Unfortunately checking
:-/
> what happens without using sysfs to set the primary device as eth0, have
> to wait, as I am not able to test it before Monday.
>
No problem, thanks for the feedback and quick response.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists