lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3166726.elbgrUIZ0L@vostro.rjw.lan>
Date:	Thu, 02 May 2013 14:26:39 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Toshi Kani <toshi.kani@...com>
Cc:	ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	isimatu.yasuaki@...fujitsu.com,
	vasilis.liaskovitis@...fitbricks.com, Len Brown <lenb@...nel.org>
Subject: [PATCH 0/4] Driver core / ACPI: Add offline/online for graceful hot-removal of devices

Hi,

The following introduction is still valid for patches [1-3/4] and patch [4/4]
reworks the ACPI processor driver to use the new code.  Some details below.

On Monday, April 29, 2013 02:23:59 PM Rafael J. Wysocki wrote:
> 
> It has been argued for a number of times that in some cases, if a device cannot
> be gracefully removed from the system, it shouldn't be removed from it at all,
> because that may lead to a kernel crash.  In particular, that will happen if a
> memory module holding kernel memory is removed, but also removing the last CPU
> in the system may not be a good idea.  [And I can imagine a few other cases
> like that.]
> 
> The kernel currently only supports "forced" hot-remove which cannot be stopped
> once started, so users have no choice but to try to hot-remove stuff and see
> whether or not that crashes the kernel which is kind of unpleasant.  That seems
> to be based on the "the user knows better" argument according to which users
> triggering device hot-removal should really know what they are doing, so the
> kernel doesn't have to worry about that.  However, for instance, this pretty
> much isn't the case for memory modules, because the users have no way to see
> whether or not any kernel memory has been allocated from a given module.
> 
> There have been a few attempts to address this issue, but none of them has
> gained broader acceptance.  The following 3 patches are the heart of a new
> proposal which is based on the idea to introduce device_offline() and
> device_online() operations along the lines of the existing CPU offline/online
> mechanism (or, rather, to extend the CPU offline/online so that analogous
> operations are available for other devices).  The way it is supposed to work is
> that device_offline() will fail if the given device cannot be gracefully
> removed from the system (in the kernel's view).  Once it succeeds, though, the
> device won't be used any more until either it is removed, or device_online() is
> run for it.  That will allow the ACPI device hot-remove code, for one example,
> to avoid triggering a non-reversible removal procedure for devices that cannot
> be removed gracefully.
> 
> Patch [1/3] introduces device_offline() and device_online() as outlined above.
> The .offline() and .online() callbacks are only added at the bus type level for
> now, because that should be sufficient to cover the memory and CPU use cases.

That's [1/4] now and the changes from the previous version are:
- strtobool() is used in store_online().
- device_offline_lock has been renamed to device_hotplug_lock (and the
  functions operating it accordingly) following the Toshi's advice.

> Patch [2/3] modifies the CPU hotplug support code to use device_offline() and
> device_online() to support the sysfs 'online' attribute for CPUs.

That is [2/4] now and it takes cpu_hotplug_driver_lock() around cpu_up() and
cpu_down().

> Patch [3/3] changes the ACPI device hot-remove code to use device_offline()
> for checking if graceful removal of devices is possible.  The way it does that
> is to walk the list of "physical" companion devices for each struct acpi_device
> involved in the operation and call device_offline() for each of them.  If any
> of the device_offline() calls fails (and the hot-removal is not "forced", which
> is an option), the removal procedure (which is not reversible) is simply not
> carried out.

That's current [3/4].  It's a bit simpler, because I decided that it would be
better to have a global 'force_remove' attribute (the semantics of the
per-profile 'force_remove' wasn't clear and it didn't really add any value over
a global one).  I also added lock/unlock_device_hotplug() around acpi_bus_scan()
in acpi_scan_bus_device_check() to allow scan handlers to update dev->offline
for "physical" companion devices safely (the processor's one added by the next
patch actually does that).

> Of some concern is that device_offline() (and possibly device_online()) is
> called under physical_node_lock of the corresponding struct acpi_device, which
> introduces ordering dependency between that lock and device locks for the
> "physical" devices, but I didn't see any cleaner way to do that (I guess it
> is avoidable at the expense of added complexity, but for now it's just better
> to make the code as clean as possible IMO).

Patch [4/4] reworks the ACPI processor driver to use the common hotplug code.
It basically splits the driver into two parts as described in the changelog,
where the first part is essentially a scan handler and the second part is
a driver, but it doesn't bind to struct acpi_device any more.  Instead, it
binds to processor devices under /sys/devices/system/cpu/ (the driver itself
has a sysfs directory under /sys/bus/cpu/drivers/ which IMHO makes more sense
than having it under /sys/bus/acpi/drivers/).

The patch at https://patchwork.kernel.org/patch/2506371/ is a prerequisite
for this series, but I'm going to push it for v3.10-rc2 if no one screams
bloody murder.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ