lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0712142304470.23504@bizon.gios.gov.pl>
Date:	Fri, 14 Dec 2007 23:11:15 +0100 (CET)
From:	Krzysztof Oledzki <olel@....pl>
To:	Andy Gospodarek <andy@...yhouse.net>
cc:	Jay Vosburgh <fubar@...ibm.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Andrew Morton <akpm@...ux-foundation.org>,
	bugme-daemon@...zilla.kernel.org, shemminger@...ux-foundation.org,
	davem@...emloft.net, netdev@...r.kernel.org
Subject: Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at
 net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c
 (1055)



On Fri, 14 Dec 2007, Andy Gospodarek wrote:

> On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:
>>
>>
>> On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>>
>>> On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:
>>>>
>>>>
>>>> On Wed, 12 Dec 2007, Jay Vosburgh wrote:
>>>>
>>>>> Herbert Xu <herbert@...dor.apana.org.au> wrote:
>>>>>
>>>>>>> diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>>>>>>> drivers/net/bonding/bond_sysfs.c
>>>>>>> --- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>>>>>>> +++ a/drivers/net/bonding/bond_sysfs.c
>>>>>>> @@ -1111,8 +1111,6 @@ static ssize_t bonding_store_primary(str
>>>>>>> out:
>>>>>>>      write_unlock_bh(&bond->lock);
>>>>>>>
>>>>>>> -       rtnl_unlock();
>>>>>>> -
>>>>>>
>>>>>> Looking at the changeset that added this perhaps the intention
>>>>>> is to hold the lock? If so we should add an rtnl_lock to the start
>>>>>> of the function.
>>>>>
>>>>> 	Yes, this function needs to hold locks, and more than just
>>>>> what's there now.  I believe the following should be correct; I haven't
>>>>> tested it, though (I'm supposedly on vacation right now).
>>>>>
>>>>> 	The following change should be correct for the
>>>>> bonding_store_primary case discussed in this thread, and also corrects
>>>>> the bonding_store_active case which performs similar functions.
>>>>>
>>>>> 	The bond_change_active_slave and bond_select_active_slave
>>>>> functions both require rtnl, bond->lock for read and curr_slave_lock for
>>>>> write_bh, and no other locks.  This is so that the lower level
>>>>> mode-specific functions can release locks down to just rtnl in order to
>>>>> call, e.g., dev_set_mac_address with the locks it expects (rtnl only).
>>>>>
>>>>> Signed-off-by: Jay Vosburgh <fubar@...ibm.com>
>>>>>
>>>>> diff --git a/drivers/net/bonding/bond_sysfs.c
>>>>> b/drivers/net/bonding/bond_sysfs.c
>>>>> index 11b76b3..28a2d80 100644
>>>>> --- a/drivers/net/bonding/bond_sysfs.c
>>>>> +++ b/drivers/net/bonding/bond_sysfs.c
>>>>> @@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
>>>>> *d,
>>>>> 	struct slave *slave;
>>>>> 	struct bonding *bond = to_bond(d);
>>>>>
>>>>> -	write_lock_bh(&bond->lock);
>>>>> +	rtnl_lock();
>>>>> +	read_lock(&bond->lock);
>>>>> +	write_lock_bh(&bond->curr_slave_lock);
> F
>>>>> +
>>>>> 	if (!USES_PRIMARY(bond->params.mode)) {
>>>>> 		printk(KERN_INFO DRV_NAME
>>>>> 		       ": %s: Unable to set primary slave; %s is in mode
>>>>> 		       %d\n",
>>>>> @@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
>>>>> *d,
>>>>> 		}
>>>>> 	}
>>>>> out:
>>>>> -	write_unlock_bh(&bond->lock);
>>>>> -
>>>>> +	write_unlock_bh(&bond->curr_slave_lock);
>>>>> +	read_unlock(&bond->lock);
>>>>> 	rtnl_unlock();
>>>>>
>>>>> 	return count;
>>>>> @@ -1190,7 +1193,8 @@ static ssize_t bonding_store_active_slave(struct
>>>>> device *d,
>>>>> 	struct bonding *bond = to_bond(d);
>>>>>
>>>>> 	rtnl_lock();
>>>>> -	write_lock_bh(&bond->lock);
>>>>> +	read_lock(&bond->lock);
>>>>> +	write_lock_bh(&bond->curr_slave_lock);
>>>>>
>>>>> 	if (!USES_PRIMARY(bond->params.mode)) {
>>>>> 		printk(KERN_INFO DRV_NAME
>>>>> @@ -1247,7 +1251,8 @@ static ssize_t bonding_store_active_slave(struct
>>>>> device *d,
>>>>> 		}
>>>>> 	}
>>>>> out:
>>>>> -	write_unlock_bh(&bond->lock);
>>>>> +	write_unlock_bh(&bond->curr_slave_lock);
>>>>> +	read_unlock(&bond->lock);
>>>>> 	rtnl_unlock();
>>>>>
>>>>> 	return count;
>>>>
>>>> Vanilla 2.6.24-rc5 plus this patch:
>>>>
>>>> =========================================================
>>>> [ INFO: possible irq lock inversion dependency detected ]
>>>> 2.6.24-rc5 #1
>>>> ---------------------------------------------------------
>>>> events/0/9 just changed the state of lock:
>>>> (&mc->mca_lock){-+..}, at: [<c0411c7a>] mld_ifc_timer_expire+0x130/0x1fb
>>>> but this lock took another, soft-read-irq-unsafe lock in the past:
>>>> (&bond->lock){-.--}
>>>>
>>>> and interrupts could create inverse lock ordering between them.
>>>>
>>>>
>>>
>>> Grrr, I should have seen that -- sorry.  Try your luck with this instead:
>> <CUT>
>>
>> No luck.
>>
>> bonding: bond0: setting mode to active-backup (1).
>> bonding: bond0: Setting MII monitoring interval to 100.
>> ADDRCONF(NETDEV_UP): bond0: link is not ready
>> bonding: bond0: Adding slave eth0.
>> e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow
>> Control: RX/TX
>> bonding: bond0: making interface eth0 the new active one.
>> bonding: bond0: first active interface up!
>> bonding: bond0: enslaving eth0 as an active interface with an up link.
>> bonding: bond0: Adding slave eth1.
>> ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
>
> <SNIP>
>
>> bonding: bond0: enslaving eth1 as a backup interface with a down link.
>> bonding: bond0: Setting eth0 as primary slave.
>> bond0: no IPv6 routers present
>>
>
> Based on the console log, I'm guessing your initialization scripts use
> sysfs to set eth0 as the primary interface for bond0?  Can you confirm?

Yep, that's correct:

postup() {
         if [[ ${IFACE} == "bond0" ]] ; then
                 echo -n +eth0 > /sys/class/net/${IFACE}/bonding/slaves
                 echo -n +eth1 > /sys/class/net/${IFACE}/bonding/slaves
                 echo -n  eth0 > /sys/class/net/${IFACE}/bonding/primary
         fi
}

> If you did somehow use sysfs to set the primary device as eth0, I'm
> guessing you never see this issue without that line or without this
> patch.  Please confirm this as well.

Without this patch I get another error, of course. Unfortunately checking 
what happens without using sysfs to set the primary device as eth0, have 
to wait, as I am not able to test it before Monday.

Best regards,

 				Krzysztof Olędzki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ