linux-kernel - Re: w1: 3.14-rc7 - possible recursive locking detected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <4DE9F573-55AC-4BEB-AF66-F721B145C081@goldelico.com>
Date:	Mon, 14 Apr 2014 15:10:04 +0200
From:	"Dr. H. Nikolaus Schaller" <hns@...delico.com>
To:	David Fries <david@...es.net>
Cc:	Belisko Marek <marek.belisko@...il.com>,
	Evgeniy Polyakov <zbr@...emap.net>,
	LKML <linux-kernel@...r.kernel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: w1: 3.14-rc7 - possible recursive locking detected

Hi David,
I have tested and it appears to suppress the locking issue on our
GTA04 board.

BR,
Nikolaus

Am 14.04.2014 um 00:24 schrieb David Fries:

> Belisko Marek,
> Here is a possible solution, could you give it a try and report back?
> 
> Greg Kroah-Hartman,
> Evgeniy asked me to look into this report.  I don't have the
> reporter's hardware configuration, but I wouldn't think that would be
> needed, just some w1 bus master (even W1_MASTER_GPIO might work), then
> loading the slave device and manually adding a slave device with that
> family id.  Even then I didn't reproduce the reported recursive
> locking error.  I saw unrelated locking reports, but not this one.  I
> wrote up the included patch, but that undoes the notify changes that
> you added earlier in commit 47eba33a0997fc7, and I wanted to ask about
> that commit.  Specifically these two lines,
> 
>        err = device_register(&sl->dev);
> 	...
> +	dev_set_uevent_suppress(&sl->dev, false);
> +	kobject_uevent(&sl->dev.kobj, KOBJ_ADD);
> 
> Wouldn't the default be to not suppress?  Nothing in the W1 system
> enables suppressing so is that even needed?  (I'm fine with saying
> it's a good idea).
> device_register at some point must call device_add and device_add
> calls kobject_uevent(&dev->kobj, KOBJ_ADD);
> so doesn't the KOBJ_ADD send the add a second time?  As in it
> shouldn't be needed?
> Can the suppress be called before device_register to avoid the
> automatic notify, then after it returns setup the slave device as this
> patch does to avoid this problem report, and then call the KOBJ_ADD to
> make everything happy?
> 
> I'm based on commit ce7613db2d8d4d5af2587a.
> 
> On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote:
>> Hi,
>> 
>> On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov <zbr@...emap.net> wrote:
>>> Hi everyone
>>> 
>>> 24.03.2014, 01:50, "David Fries" <David@...es.net>:
>>>> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
>>>>> booting latest 3.14-rc7 on gta04 board gives following warning:
>>>>> 
>>>>> [    3.101409] Driver for 1-wire Dallas network protocol.
>>>>> [    3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
>>>>> [    3.400146]        CPU0
>>>>> [    3.402709]        ----
>>>>> [    3.405273]   lock(&(&priv->bus_notifier)->rwsem);
>>>>> [    3.410308]   lock(&(&priv->bus_notifier)->rwsem);
>>>>> [    3.415374]
>>>>> [    3.415374]  *** DEADLOCK ***
>>>>> 
>>>>> AFAIK we didn't see that on (at least 3/14-rc2).
>>>> 
>>>> The last drivers/w1 change went in before v3.14-rc1, so if it is
>>>> something in the w1 code, it has either been there or something else
>>>> in the rest of the kernel changed to expose it.  I'm using the ds2490
>>>> USB w1 dongle, I didn't see that problem when enabling lock debugging
>>>> on 3.14.0-rc5+ which includes some changes I'm working on.
>>>> 
>>>> https://github.com/dfries/linux.git w1_rework
>>> 
>>> Can it be a bug in omap w1 driver?
>> We test with actual Linus master and it's even worse (it hangs completely).
>> Trace look little bit different and it only happens when booting via
>> device tree (not with board file).
>> 
>> [    2.807434] Driver for 1-wire Dallas network protocol.
>> [    2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
>> Interrupt mode
>> [    3.034271] call_modprobe: w1-family-0x1
>> [    3.051818]
>> [    3.053405] =============================================
>> [    3.059051] [ INFO: possible recursive locking detected ]
>> [    3.064727] 3.14.0-gta04 #556 Not tainted
>> [    3.068939] ---------------------------------------------
>> [    3.074615] w1_bus_master1/795 is trying to acquire lock:
>> [    3.080261]  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
>> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
>> [    3.090911]
>> [    3.090911] but task is already holding lock:
>> [    3.097045]  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
>> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
>> [    3.107666]
>> [    3.107666] other info that might help us debug this:
>> [    3.114501]  Possible unsafe locking scenario:
>> [    3.114501]
>> [    3.120727]        CPU0
>> [    3.123291]        ----
>> [    3.125854]   lock(&(&priv->bus_notifier)->rwsem);
>> [    3.130889]   lock(&(&priv->bus_notifier)->rwsem);
>> [    3.135925]
>> [    3.135925]  *** DEADLOCK ***
>> [    3.135925]
>> [    3.142150]  May be due to missing lock nesting notation
>> [    3.142150]
>> [    3.149291] 2 locks held by w1_bus_master1/795:
>> [    3.154052]  #0:  (&dev->mutex#3){+.+.+.}, at: [<c03ba1c8>]
>> w1_attach_slave_device+0xc4/0x1c8
>> [    3.163055]  #1:  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
>> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
>> [    3.174163]
>> [    3.174163] stack backtrace:
>> [    3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted
>> 3.14.0-gta04 #556
>> [    3.186370] [<c0013ca0>] (unwind_backtrace) from [<c0010f68>]
>> (show_stack+0x10/0x14)
>> [    3.194488] [<c0010f68>] (show_stack) from [<c04f0900>]
>> (dump_stack+0x6c/0x88)
>> [    3.202117] [<c04f0900>] (dump_stack) from [<c00746bc>]
>> (print_deadlock_bug+0xc0/0xf0)
>> [    3.210449] [<c00746bc>] (print_deadlock_bug) from [<c0075f80>]
>> (validate_chain.isra.29+0x4dc/0x534)
>> [    3.220031] [<c0075f80>] (validate_chain.isra.29) from [<c0076dac>]
>> (__lock_acquire+0x728/0x834)
>> [    3.229248] [<c0076dac>] (__lock_acquire) from [<c00774b8>]
>> (lock_acquire+0xf4/0x118)
>> [    3.237487] [<c00774b8>] (lock_acquire) from [<c04f8e04>]
>> (down_read+0x24/0x34)
>> [    3.245178] [<c04f8e04>] (down_read) from [<c0058ab0>]
>> (__blocking_notifier_call_chain+0x28/0x58)
>> [    3.254516] [<c0058ab0>] (__blocking_notifier_call_chain) from
>> [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18)
>> [    3.265563] [<c0058af4>] (blocking_notifier_call_chain) from
>> [<c0301db8>] (device_add+0x1f4/0x368)
>> [    3.274993] [<c0301db8>] (device_add) from [<c03055cc>]
>> (platform_device_add+0x138/0x1c8)
>> [    3.283569] [<c03055cc>] (platform_device_add) from [<c03bd124>]
>> (w1_bq27000_add_slave+0x44/0x74)
>> [    3.292907] [<c03bd124>] (w1_bq27000_add_slave) from [<c03b9ed8>]
>> (w1_bus_notify.part.0+0x48/0xc8)
>> [    3.302337] [<c03b9ed8>] (w1_bus_notify.part.0) from [<c04fbe1c>]
>> (notifier_call_chain+0x38/0x68)
>> [    3.311645] [<c04fbe1c>] (notifier_call_chain) from [<c0058acc>]
>> (__blocking_notifier_call_chain+0x44/0x58)
>> [    3.321899] [<c0058acc>] (__blocking_notifier_call_chain) from
>> [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18)
>> [    3.332946] [<c0058af4>] (blocking_notifier_call_chain) from
>> [<c0301db8>] (device_add+0x1f4/0x368)
>> [    3.342346] [<c0301db8>] (device_add) from [<c03b9ad0>]
>> (__w1_attach_slave_device+0x9c/0x134)
>> [    3.351318] [<c03b9ad0>] (__w1_attach_slave_device) from
>> [<c03ba238>] (w1_attach_slave_device+0x134/0x1c8)
>> [    3.361480] [<c03ba238>] (w1_attach_slave_device) from [<c03ba348>]
>> (w1_slave_found+0x7c/0x98)
>> [    3.370513] [<c03ba348>] (w1_slave_found) from [<c03bc818>]
>> (omap_w1_search_bus+0x54/0x5c)
>> [    3.379211] [<c03bc818>] (omap_w1_search_bus) from [<c03bc758>]
>> (w1_search_devices+0x3c/0x48)
>> [    3.388153] [<c03bc758>] (w1_search_devices) from [<c03baaec>]
>> (w1_search_process_cb+0x64/0x108)
>> [    3.397399] [<c03baaec>] (w1_search_process_cb) from [<c03bac60>]
>> (w1_process+0x68/0x14c)
>> [    3.405975] [<c03bac60>] (w1_process) from [<c0054bdc>] (kthread+0xdc/0xf0)
>> [    3.413299] [<c0054bdc>] (kthread) from [<c000dca8>]
>> (ret_from_fork+0x14/0x2c)
>> [   24.451049] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=2102 jiffies, g=4294967086, c=4294967085, q=19)
>> [   24.463134] INFO: Stall ended before state dump start
>> [   87.501037] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=8407 jiffies, g=4294967086, c=4294967085, q=19)
>> [   87.513122] INFO: Stall ended before state dump start
>> [  150.551055] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=14712 jiffies, g=4294967086, c=4294967085, q=19)
>> [  150.563232] INFO: Stall ended before state dump start
>> [  213.601043] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=21017 jiffies, g=4294967086, c=4294967085, q=19)
>> [  213.613220] INFO: Stall ended before state dump start
>> [  276.651062] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=27322 jiffies, g=4294967086, c=4294967085, q=19)
>> [  276.663238] INFO: Stall ended before state dump start
>> 
>> BR,
>> 
>> marek
>> 
>> -- 
>> as simple and primitive as possible
>> -------------------------------------------------
>> Marek Belisko - OPEN-NANDRA
>> Freelance Developer
>> 
>> Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
>> Tel: +421 915 052 184
>> skype: marekwhite
>> twitter: #opennandra
>> web: http://open-nandra.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> David Fries <david@...es.net>    PGP pub CB1EE8F0
> http://fries.net/~david/
> 
> 
> commit b3afc24a3ca6c261a8fe2c08b5bbccb51233c09a
> Author: David Fries <David@...es.net>
> Date:   Sun Apr 13 15:36:45 2014 -0500
> 
>    w1: avoid recursive device_add
> 
>    __w1_attach_slave_device calls device_add which calls w1_bus_notify
>    which calls for the w1_bq27000 slave driver, which calls
>    platform_device_add and device_add and deadlocks on getting
>    &(&priv->bus_notifier)->rwsem as it is still held in the previous
>    device_add.  This avoids the problem by processing the family
>    add/remove outside of the slave device_add call.  This does mean that
>    the slave device isn't setup when the notification that a slave is
>    available goes out.
> 
>    Reported-by: Belisko Marek <marek.belisko@...il.com>
> 
> diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
> index b96f61b..dde09c1 100644
> --- a/drivers/w1/w1.c
> +++ b/drivers/w1/w1.c
> @@ -614,27 +614,11 @@ end:
> 	return err;
> }
> 
> -/*
> - * Handle sysfs file creation and removal here, before userspace is told that
> - * the device is added / removed from the system
> - */
> -static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
> -			 void *data)
> +static int w1_family_notify(unsigned long action, struct w1_slave *sl)
> {
> -	struct device *dev = data;
> -	struct w1_slave *sl;
> 	struct w1_family_ops *fops;
> 	int err;
> 
> -	/*
> -	 * Only care about slave devices at the moment.  Yes, we should use a
> -	 * separate "type" for this, but for now, look at the release function
> -	 * to know which type it is...
> -	 */
> -	if (dev->release != w1_slave_release)
> -		return 0;
> -
> -	sl = dev_to_w1_slave(dev);
> 	fops = sl->family->fops;
> 
> 	if (!fops)
> @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
> 	return 0;
> }
> 
> -static struct notifier_block w1_bus_nb = {
> -	.notifier_call = w1_bus_notify,
> -};
> -
> static int __w1_attach_slave_device(struct w1_slave *sl)
> {
> 	int err;
> @@ -705,6 +685,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
> 			dev_name(&sl->dev), err);
> 		return err;
> 	}
> +	w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl);
> 
> 
> 	dev_set_uevent_suppress(&sl->dev, false);
> @@ -799,6 +780,7 @@ int w1_unref_slave(struct w1_slave *sl)
> 		msg.type = W1_SLAVE_REMOVE;
> 		w1_netlink_send(sl->master, &msg);
> 
> +		w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
> 		device_unregister(&sl->dev);
> 		#ifdef DEBUG
> 		memset(sl, 0, sizeof(*sl));
> @@ -1186,10 +1168,6 @@ static int __init w1_init(void)
> 		goto err_out_exit_init;
> 	}
> 
> -	retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb);
> -	if (retval)
> -		goto err_out_bus_unregister;
> -
> 	retval = driver_register(&w1_master_driver);
> 	if (retval) {
> 		printk(KERN_ERR

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/