lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0801071854140.6696@bizon.gios.gov.pl>
Date:	Mon, 7 Jan 2008 18:57:25 +0100 (CET)
From:	Krzysztof Oledzki <olel@....pl>
To:	Andy Gospodarek <andy@...yhouse.net>
cc:	Jay Vosburgh <fubar@...ibm.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Andrew Morton <akpm@...ux-foundation.org>,
	bugme-daemon@...zilla.kernel.org, shemminger@...ux-foundation.org,
	davem@...emloft.net, netdev@...r.kernel.org
Subject: Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at
 net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c
 (1055)



On Wed, 19 Dec 2007, Andy Gospodarek wrote:

> On Tue, Dec 18, 2007 at 08:53:39PM +0100, Krzysztof Oledzki wrote:
>>
>>
>> On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>>
>>> On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:
>>>>
>>>>
>>>> On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>>>>
>>>>> On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, 12 Dec 2007, Jay Vosburgh wrote:
>>>>>>
>>>>>>> Herbert Xu <herbert@...dor.apana.org.au> wrote:
>>>>>>>
>>>>>>>>> diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>>>>>>>>> drivers/net/bonding/bond_sysfs.c
>>>>>>>>> --- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>>>>>>>>> +++ a/drivers/net/bonding/bond_sysfs.c
>>>>>>>>> @@ -1111,8 +1111,6 @@ static ssize_t bonding_store_primary(str
>>>>>>>>> out:
>>>>>>>>>     write_unlock_bh(&bond->lock);
>>>>>>>>>
>>>>>>>>> -       rtnl_unlock();
>>>>>>>>> -
>>>>>>>>
>>>>>>>> Looking at the changeset that added this perhaps the intention
>>>>>>>> is to hold the lock? If so we should add an rtnl_lock to the start
>>>>>>>> of the function.
>>>>>>>
>>>>>>> 	Yes, this function needs to hold locks, and more than just
>>>>>>> what's there now.  I believe the following should be correct; I haven't
>>>>>>> tested it, though (I'm supposedly on vacation right now).
>>>>>>>
>>>>>>> 	The following change should be correct for the
>>>>>>> bonding_store_primary case discussed in this thread, and also corrects
>>>>>>> the bonding_store_active case which performs similar functions.
>>>>>>>
>>>>>>> 	The bond_change_active_slave and bond_select_active_slave
>>>>>>> functions both require rtnl, bond->lock for read and curr_slave_lock
>>>>>>> for
>>>>>>> write_bh, and no other locks.  This is so that the lower level
>>>>>>> mode-specific functions can release locks down to just rtnl in order to
>>>>>>> call, e.g., dev_set_mac_address with the locks it expects (rtnl only).
>>>>>>>
>>>>>>> Signed-off-by: Jay Vosburgh <fubar@...ibm.com>
>>>>>>>
>>>>>>> diff --git a/drivers/net/bonding/bond_sysfs.c
>>>>>>> b/drivers/net/bonding/bond_sysfs.c
>>>>>>> index 11b76b3..28a2d80 100644
>>>>>>> --- a/drivers/net/bonding/bond_sysfs.c
>>>>>>> +++ b/drivers/net/bonding/bond_sysfs.c
>>>>>>> @@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct
>>>>>>> device
>>>>>>> *d,
>>>>>>> 	struct slave *slave;
>>>>>>> 	struct bonding *bond = to_bond(d);
>>>>>>>
>>>>>>> -	write_lock_bh(&bond->lock);
>>>>>>> +	rtnl_lock();
>>>>>>> +	read_lock(&bond->lock);
>>>>>>> +	write_lock_bh(&bond->curr_slave_lock);
>>>>>>> +
>>>>>>> 	if (!USES_PRIMARY(bond->params.mode)) {
>>>>>>> 		printk(KERN_INFO DRV_NAME
>>>>>>> 		       ": %s: Unable to set primary slave; %s is in mode
>>>>>>> 		       %d\n",
>>>>>>> @@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct
>>>>>>> device
>>>>>>> *d,
>>>>>>> 		}
>>>>>>> 	}
>>>>>>> out:
>>>>>>> -	write_unlock_bh(&bond->lock);
>>>>>>> -
>>>>>>> +	write_unlock_bh(&bond->curr_slave_lock);
>>>>>>> +	read_unlock(&bond->lock);
>>>>>>> 	rtnl_unlock();
>>>>>>>
>>>>>>> 	return count;
>>>>>>> @@ -1190,7 +1193,8 @@ static ssize_t bonding_store_active_slave(struct
>>>>>>> device *d,
>>>>>>> 	struct bonding *bond = to_bond(d);
>>>>>>>
>>>>>>> 	rtnl_lock();
>>>>>>> -	write_lock_bh(&bond->lock);
>>>>>>> +	read_lock(&bond->lock);
>>>>>>> +	write_lock_bh(&bond->curr_slave_lock);
>>>>>>>
>>>>>>> 	if (!USES_PRIMARY(bond->params.mode)) {
>>>>>>> 		printk(KERN_INFO DRV_NAME
>>>>>>> @@ -1247,7 +1251,8 @@ static ssize_t bonding_store_active_slave(struct
>>>>>>> device *d,
>>>>>>> 		}
>>>>>>> 	}
>>>>>>> out:
>>>>>>> -	write_unlock_bh(&bond->lock);
>>>>>>> +	write_unlock_bh(&bond->curr_slave_lock);
>>>>>>> +	read_unlock(&bond->lock);
>>>>>>> 	rtnl_unlock();
>>>>>>>
>>>>>>> 	return count;
>>>>>>
>>>>>> Vanilla 2.6.24-rc5 plus this patch:
>>>>>>
>>>>>> =========================================================
>>>>>> [ INFO: possible irq lock inversion dependency detected ]
>>>>>> 2.6.24-rc5 #1
>>>>>> ---------------------------------------------------------
>>>>>> events/0/9 just changed the state of lock:
>>>>>> (&mc->mca_lock){-+..}, at: [<c0411c7a>] mld_ifc_timer_expire+0x130/0x1fb
>>>>>> but this lock took another, soft-read-irq-unsafe lock in the past:
>>>>>> (&bond->lock){-.--}
>>>>>>
>>>>>> and interrupts could create inverse lock ordering between them.
>>>>>>
>>>>>>
>>>>>
>>>>> Grrr, I should have seen that -- sorry.  Try your luck with this instead:
>>>> <CUT>
>>>>
>>>> No luck.
>>>>
>>>
>>>
>>> I'm guessing if we go back to using a write-lock for bond->lock this
>>> will go back to working again, but I'm not totally convinced since there
>>> are plenty of places where we used a read-lock with it.
>>
>> Should I check this patch or rather, based on a future discussion, wait
>> for another version?
>>
>>>
>>> diff --git a/drivers/net/bonding/bond_sysfs.c
>>> b/drivers/net/bonding/bond_sysfs.c
>>> index 11b76b3..635b857 100644
>>> --- a/drivers/net/bonding/bond_sysfs.c
>>> +++ b/drivers/net/bonding/bond_sysfs.c
>>> @@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
>>> *d,
>>> 	struct slave *slave;
>>> 	struct bonding *bond = to_bond(d);
>>>
>>> +	rtnl_lock();
>>> 	write_lock_bh(&bond->lock);
>>> +	write_lock_bh(&bond->curr_slave_lock);
>>> +
>>> 	if (!USES_PRIMARY(bond->params.mode)) {
>>> 		printk(KERN_INFO DRV_NAME
>>> 		       ": %s: Unable to set primary slave; %s is in mode
>>> 		       %d\n",
>>> @@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
>>> *d,
>>> 		}
>>> 	}
>>> out:
>>> +	write_unlock_bh(&bond->curr_slave_lock);
>>> 	write_unlock_bh(&bond->lock);
>>> -
>>> 	rtnl_unlock();
>>>
>>> 	return count;
>>> @@ -1191,6 +1194,7 @@ static ssize_t bonding_store_active_slave(struct
>>> device *d,
>>>
>>> 	rtnl_lock();
>>> 	write_lock_bh(&bond->lock);
>>> +	write_lock_bh(&bond->curr_slave_lock);
>>>
>>> 	if (!USES_PRIMARY(bond->params.mode)) {
>>> 		printk(KERN_INFO DRV_NAME
>>> @@ -1247,6 +1251,7 @@ static ssize_t bonding_store_active_slave(struct
>>> device *d,
>>> 		}
>>> 	}
>>> out:
>>> +	write_unlock_bh(&bond->curr_slave_lock);
>>> 	write_unlock_bh(&bond->lock);
>>> 	rtnl_unlock();
>>>
>>
>>
>> Best regards,
>>
>> 					Krzysztof Olędzki
>
> For now, I prefer Jay's original patch -- with the read_locks (rather
> than read/write_lock_bh) and the added rtnl_lock.  There is still a
> lockdep issue that we need to sort-out, but this patch is needed first.

This bug has not been fixed yet as it still exists in 2.6.24-rc7. Any 
chances to cure it before 2.6.24-final?

Best regards,

 				Krzysztof Olędzki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ