[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DDC116F.8020602@gmail.com>
Date: Tue, 24 May 2011 22:13:35 +0200
From: Nicolas de Pesloüan
<nicolas.2p.debian@...il.com>
To: Andy Gospodarek <andy@...yhouse.net>
CC: Neil Horman <nhorman@...driver.com>, netdev@...r.kernel.org,
Jay Vosburgh <fubar@...ibm.com>,
"David S. Miller" <daevm@...emloft.net>
Subject: Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
Le 24/05/2011 22:00, Andy Gospodarek a écrit :
> On Tue, May 24, 2011 at 03:36:05PM -0400, Neil Horman wrote:
>> This soft lockup was recently reported:
>>
>> [root@...l-per715-01 ~]# echo +bond5> /sys/class/net/bonding_masters
>> [root@...l-per715-01 ~]# echo +eth1> /sys/class/net/bond5/bonding/slaves
>> bonding: bond5: doing slave updates when interface is down.
>> bonding bond5: master_dev is not up in bond_enslave
>> [root@...l-per715-01 ~]# echo -eth1> /sys/class/net/bond5/bonding/slaves
>> bonding: bond5: doing slave updates when interface is down.
>>
>> BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
>> CPU 12:
>> Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
>> be2d
>> Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
>> RIP: 0010:[<ffffffff80064bf0>] [<ffffffff80064bf0>]
>> .text.lock.spinlock+0x26/00
>> RSP: 0018:ffff810113167da8 EFLAGS: 00000286
>> RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
>> RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
>> RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
>> R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
>> R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
>> FS: 00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
>>
>> Call Trace:
>> [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
>> [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
>> [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
>> [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
>> [<ffffffff8006457b>] __down_write_nested+0x12/0x92
>> [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
>> [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
>> [<ffffffff80016b87>] vfs_write+0xce/0x174
>> [<ffffffff80017450>] sys_write+0x45/0x6e
>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>>
>> It occurs because we are able to change the slave configuarion of a bond while
>> the bond interface is down. The bonding driver initializes some data structures
>> only after its ndo_open routine is called. Among them is the initalization of
>> the alb tx and rx hash locks. So if we add or remove a slave without first
>> opening the bond master device, we run the risk of trying to lock/unlock a
>> spinlock that has garbage for data in it, which results in our above softlock.
>>
>> We could fix it by moving the spin lock initalization to the device creation
>> path, but it seems that since we're warning people about not doing this, we
>> should probably just disallow them from doing it, so fix it by adding an EINVAL
>> return if we're not up yet. Tested by the reporter and confirmed to fix the
>> problem.
>>
>> Signed-off-by: Neil Horman<nhorman@...driver.com>
>
> Signed-off-by: Andy Gospodarek<andy@...yhouse.net>
>
>> Reported-by: jtluka@...hat.com
>> CC: Jay Vosburgh<fubar@...ibm.com>
>> CC: Andy Gospodarek<andy@...yhouse.net>
>> CC: "David S. Miller"<daevm@...emloft.net>
>> ---
>> drivers/net/bonding/bond_sysfs.c | 1 +
>> 1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>> index 4059bfc..206c543 100644
>> --- a/drivers/net/bonding/bond_sysfs.c
>> +++ b/drivers/net/bonding/bond_sysfs.c
>> @@ -231,6 +231,7 @@ static ssize_t bonding_store_slaves(struct device *d,
>> if (!(bond->dev->flags& IFF_UP)) {
>> pr_warning("%s: doing slave updates when interface is down.\n",
>> bond->dev->name);
>> + return -EINVAL;
This will turn a warning into an error.
This warning existed for long, but never caused the bonding setup to fail. This patch cause some
regression for user space. For example, current ifenslave-2.6 package in Debian doesn't ensure bond
is UP before enslaving, because this was never required.
NAK.
>> }
>>
>> if (!rtnl_trylock())
>> --
>> 1.7.5.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists