lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 24 Jan 2020 17:31:49 +0100 From: David Hildenbrand <david@...hat.com> To: Greg Kroah-Hartman <gregkh@...uxfoundation.org> Cc: linux-kernel@...r.kernel.org, "Rafael J. Wysocki" <rafael@...nel.org>, Suzuki K Poulose <suzuki.poulose@....com>, Saravana Kannan <saravanak@...gle.com>, Heikki Krogerus <heikki.krogerus@...ux.intel.com>, Dan Williams <dan.j.williams@...el.com>, Michal Hocko <mhocko@...nel.org> Subject: Re: [PATCH v1] driver core: check for dead devices before onlining/offlining On 24.01.20 14:48, David Hildenbrand wrote: > On 24.01.20 14:31, David Hildenbrand wrote: >> On 24.01.20 10:12, Greg Kroah-Hartman wrote: >>> On Fri, Jan 24, 2020 at 10:09:03AM +0100, David Hildenbrand wrote: >>>> On 24.01.20 10:00, Greg Kroah-Hartman wrote: >>>>> On Mon, Jan 20, 2020 at 11:49:09AM +0100, David Hildenbrand wrote: >>>>>> We can have rare cases where the removal of a device races with >>>>>> somebody trying to online it (esp. via sysfs). We can simply check >>>>>> if the device is already removed or getting removed under the dev->lock. >>>>>> >>>>>> E.g., right now, if memory block devices are removed (remove_memory()), >>>>>> we do a: >>>>>> >>>>>> remove_memory() -> lock_device_hotplug() -> mem_hotplug_begin() -> >>>>>> lock_device() -> dev->dead = true >>>>>> >>>>>> Somebody coming via sysfs (/sys/devices/system/memory/memoryX/online) >>>>>> triggers a: >>>>>> >>>>>> lock_device_hotplug_sysfs() -> device_online() -> lock_device() ... >>>>>> >>>>>> So if we made it just before the lock_device_hotplug_sysfs() but get >>>>>> delayed until remove_memory() released all locks, we will continue >>>>>> taking locks and trying to online the device - which is then a zombie >>>>>> device. >>>>>> >>>>>> Note that at least the memory onlining path seems to be protected by >>>>>> checking if all memory sections are still present (something we can then >>>>>> get rid of). We do have other sysfs attributes >>>>>> (e.g., /sys/devices/system/memory/memoryX/valid_zones) that don't do any >>>>>> such locking yet and might race with memory removal in a similar way. For >>>>>> these users, we can then do a >>>>>> >>>>>> device_lock(dev); >>>>>> if (!device_is_dead(dev)) { >>>>>> /* magic /* >>>>>> } >>>>>> device_unlock(dev); >>>>>> >>>>>> Introduce and use device_is_dead() right away. >>>>>> >>>>>> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org> >>>>>> Cc: "Rafael J. Wysocki" <rafael@...nel.org> >>>>>> Cc: Suzuki K Poulose <suzuki.poulose@....com> >>>>>> Cc: Saravana Kannan <saravanak@...gle.com> >>>>>> Cc: Heikki Krogerus <heikki.krogerus@...ux.intel.com> >>>>>> Cc: Dan Williams <dan.j.williams@...el.com> >>>>>> Cc: Michal Hocko <mhocko@...nel.org> >>>>>> Signed-off-by: David Hildenbrand <david@...hat.com> >>>>>> --- >>>>>> >>>>>> Am I missing any obvious mechanism in the device core that handles >>>>>> something like this already? (especially also for other sysfs attributes?) >>>>> >>>>> So is a sysfs attribute causing the device itself to go away? We have >>>> >>>> nope, removal is triggered via the driver, not via a sysfs attribute. >>> >>> But the idea is the same, it comes from the driver, not the driver core. >>> >>>> Regarding this patch: Is there anything prohibiting the possible >>>> scenario I document above (IOW, is this patch applicable, or is there >>>> another way to fence it properly (e.g., the "specific call" you mentioned))? >>> >>> I think it's the same thing, look at how scsi does it. >> >> I think you are talking about doing a "transport_remove_device(dev)" >> before doing the "device_del(dev)", combined with proper locking. >> >> Will look into that for the memory subsystem ... > > ... looking into transports, it most probably does not apply here, hmm ... > /me digging through random code I think you were referring to 1. sysfs_break_active_protection() 2. sysfs_unbreak_active_protection() Which can be used to implement a self-removal of devices via attributes (e.g., sdev_store_delete()) I don't see yet how that would solve the race between some driver removing an offline device and someone being stuck in an attribute, waiting for a lock - but maybe I am looking at the wrong function again :) -- Thanks, David / dhildenb
Powered by blists - more mailing lists