[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140417151728.GK15326@htj.dyndns.org>
Date: Thu, 17 Apr 2014 11:17:28 -0400
From: Tejun Heo <tj@...nel.org>
To: Li Zhong <zhong@...ux.vnet.ibm.com>
Cc: LKML <linux-kernel@...r.kernel.org>, gregkh@...uxfoundation.org,
rafael.j.wysocki@...el.com, toshi.kani@...com
Subject: Re: [RFC PATCH v4] Use kernfs_break_active_protection() for device
online store callbacks
Hello,
On Thu, Apr 17, 2014 at 02:50:44PM +0800, Li Zhong wrote:
> This patch tries to solve the device hot remove locking issues in a
> different way from commit 5e33bc41, as kernfs already has a mechanism
> to break active protection.
>
> The problem here is the order of s_active, and series of hotplug related
> lock.
It prolly deservse more detailed explanation of the deadlock along
with how 5e33bc41 ("$SUBJ") tried to solve it. The active protetion
is there to keep the file alive by blocking deletion while operations
are on-going in the file. This blocking creates a dependency loop
when an operation running off a sysfs knob ends up grabbing a lock
which may be held while removing the said sysfs knob.
> + kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> + if (WARN_ON_ONCE(!kn))
> + return -ENODEV;
> +
> + /*
> + * Break active protection here to avoid deadlocks with device
> + * removing process, which tries to remove sysfs entries including this
> + * "online" attribute while holding some hotplug related locks.
> + *
> + * @dev needs to be protected here, or it could go away any time after
> + * dropping active protection. But it is still unreasonable/unsafe to
> + * online/offline a device after it being removed. Fortunately, there
I think this is something driver layer proper should provide
synchronization for. It shouldn't be difficult to synchronize this
function against device_del(), right? And, please note that @dev is
guaranteed to have not been removed (at least hasn't gone through attr
removal) upto this point.
> + * are some checks in online/offline knobs. Like cpu, it checks cpu
> + * present/online mask before doing the real work.
> + */
> +
> + get_device(dev);
> + kernfs_break_active_protection(kn);
> +
> + lock_device_hotplug();
> +
> + /*
> + * If we assume device_hotplug_lock must be acquired before removing
> + * device, we may try to find a way to check whether the device has
> + * been removed here, so we don't call device_{on|off}line against
> + * removed device.
> + */
Yeah, let's please fix this.
> ret = val ? device_online(dev) : device_offline(dev);
> unlock_device_hotplug();
> +
> + kernfs_unbreak_active_protection(kn);
> + put_device(dev);
> +
> + kernfs_put(kn);
> +
> return ret < 0 ? ret : count;
> }
> static DEVICE_ATTR_RW(online);
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index bece691..0d2f3a5 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -320,10 +320,17 @@ store_mem_state(struct device *dev,
> {
> struct memory_block *mem = to_memory_block(dev);
> int ret, online_type;
> + struct kernfs_node *kn;
>
> - ret = lock_device_hotplug_sysfs();
> - if (ret)
> - return ret;
> + kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> + if (WARN_ON_ONCE(!kn))
> + return -ENODEV;
> +
> + /* refer to comments in online_store() for more information */
> + get_device(dev);
> + kernfs_break_active_protection(kn);
> +
> + lock_device_hotplug();
>
> if (!strncmp(buf, "online_kernel", min_t(int, count, 13)))
> online_type = ONLINE_KERNEL;
> @@ -362,6 +369,11 @@ store_mem_state(struct device *dev,
> err:
> unlock_device_hotplug();
>
> + kernfs_unbreak_active_protection(kn);
> + put_device(dev);
> +
> + kernfs_put(kn);
There are other users of lock_device_hotplug_sysfs(). We probably
want to audit them and convert them too, preferably with helper
routines so that they don't end up duplicating the complexity?
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists