lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200120104909.13991-1-david@redhat.com>
Date:   Mon, 20 Jan 2020 11:49:09 +0100
From:   David Hildenbrand <david@...hat.com>
To:     linux-kernel@...r.kernel.org
Cc:     David Hildenbrand <david@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Saravana Kannan <saravanak@...gle.com>,
        Heikki Krogerus <heikki.krogerus@...ux.intel.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Michal Hocko <mhocko@...nel.org>
Subject: [PATCH v1] driver core: check for dead devices before onlining/offlining

We can have rare cases where the removal of a device races with
somebody trying to online it (esp. via sysfs). We can simply check
if the device is already removed or getting removed under the dev->lock.

E.g., right now, if memory block devices are removed (remove_memory()),
we do a:

remove_memory() -> lock_device_hotplug() -> mem_hotplug_begin() ->
lock_device() -> dev->dead = true

Somebody coming via sysfs (/sys/devices/system/memory/memoryX/online)
triggers a:

lock_device_hotplug_sysfs() -> device_online() -> lock_device() ...

So if we made it just before the lock_device_hotplug_sysfs() but get
delayed until remove_memory() released all locks, we will continue
taking locks and trying to online the device - which is then a zombie
device.

Note that at least the memory onlining path seems to be protected by
checking if all memory sections are still present (something we can then
get rid of). We do have other sysfs attributes
(e.g., /sys/devices/system/memory/memoryX/valid_zones) that don't do any
such locking yet and might race with memory removal in a similar way. For
these users, we can then do a

device_lock(dev);
if (!device_is_dead(dev)) {
	/* magic /*
}
device_unlock(dev);

Introduce and use device_is_dead() right away.

Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Suzuki K Poulose <suzuki.poulose@....com>
Cc: Saravana Kannan <saravanak@...gle.com>
Cc: Heikki Krogerus <heikki.krogerus@...ux.intel.com>
Cc: Dan Williams <dan.j.williams@...el.com>
Cc: Michal Hocko <mhocko@...nel.org>
Signed-off-by: David Hildenbrand <david@...hat.com>
---

Am I missing any obvious mechanism in the device core that handles
something like this already? (especially also for other sysfs attributes?)

---
 drivers/base/core.c    | 13 +++++++++++--
 drivers/base/dd.c      |  6 +++---
 include/linux/device.h |  1 +
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 42a672456432..01cd06eeb513 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2952,7 +2952,9 @@ int device_offline(struct device *dev)
 		return ret;
 
 	device_lock(dev);
-	if (device_supports_offline(dev)) {
+	if (device_is_dead(dev))
+		ret = -ENXIO;
+	else if (device_supports_offline(dev)) {
 		if (dev->offline) {
 			ret = 1;
 		} else {
@@ -2983,7 +2985,9 @@ int device_online(struct device *dev)
 	int ret = 0;
 
 	device_lock(dev);
-	if (device_supports_offline(dev)) {
+	if (device_is_dead(dev))
+		ret = -ENXIO;
+	else if (device_supports_offline(dev)) {
 		if (dev->offline) {
 			ret = dev->bus->online(dev);
 			if (!ret) {
@@ -2999,6 +3003,11 @@ int device_online(struct device *dev)
 	return ret;
 }
 
+bool device_is_dead(struct device *dev)
+{
+	return dev->p->dead;
+}
+
 struct root_device {
 	struct device dev;
 	struct module *owner;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index b25bcab2a26b..b1d7361720af 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -848,7 +848,7 @@ static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
 	 * and deferred probe processing happens all at once with
 	 * multiple threads.
 	 */
-	if (dev->p->dead || dev->driver)
+	if (device_is_dead(dev) || dev->driver)
 		goto out_unlock;
 
 	if (dev->parent)
@@ -994,7 +994,7 @@ int device_driver_attach(struct device_driver *drv, struct device *dev)
 	 * If device has been removed or someone has already successfully
 	 * bound a driver before us just skip the driver probe call.
 	 */
-	if (!dev->p->dead && !dev->driver)
+	if (!device_is_dead(dev) && !dev->driver)
 		ret = driver_probe_device(drv, dev);
 
 	__device_driver_unlock(dev, dev->parent);
@@ -1016,7 +1016,7 @@ static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie)
 	 * If device has been removed or someone has already successfully
 	 * bound a driver before us just skip the driver probe call.
 	 */
-	if (!dev->p->dead && !dev->driver)
+	if (!device_is_dead(dev) && !dev->driver)
 		ret = driver_probe_device(drv, dev);
 
 	__device_driver_unlock(dev, dev->parent);
diff --git a/include/linux/device.h b/include/linux/device.h
index ce6db68c3f29..acea6a14daa1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -832,6 +832,7 @@ extern void unlock_device_hotplug(void);
 extern int lock_device_hotplug_sysfs(void);
 extern int device_offline(struct device *dev);
 extern int device_online(struct device *dev);
+extern bool device_is_dead(struct device *dev);
 extern void set_primary_fwnode(struct device *dev, struct fwnode_handle *fwnode);
 extern void set_secondary_fwnode(struct device *dev, struct fwnode_handle *fwnode);
 void device_set_of_node_from_dev(struct device *dev, const struct device *dev2);
-- 
2.24.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ