lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 29 Sep 2011 23:20:42 -0600
From:	Bjorn Helgaas <bhelgaas@...gle.com>
To:	"canquan.shen" <shencanquan@...wei.com>
Cc:	Chen Gong <gong.chen@...ux.intel.com>, len.brown@...el.com,
	"shemminger@...tta.com" <shemminger@...tta.com>,
	"yakui.zhao@...el.com" <yakui.zhao@...el.com>,
	"xiaowei.yang@...wei.com" <xiaowei.yang@...wei.com>,
	hanweidong <hanweidong@...wei.com>, linqiangmin@...wei.com,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	khalid.aziz@...com
Subject: Re: [PATCH v5] acpi: Fix CPU hot removal problem

On Wed, Sep 28, 2011 at 9:41 PM, canquan.shen <shencanquan@...wei.com> wrote:
>
> On 2011/9/29 11:14, Chen Gong wrote:
>>
>> 于 2011/9/24 14:10, canquan.shen 写道:
>>>
>>> We run linux as a guest in Xen environment. When we used the xen tools
>>> (xm vcpu-set <n>) to hot add and remove vcpu to and from the guest, we
>>> encountered the failure on vcpu removal. We found the reason is that it
>>> did't go to really remove cpu in the cpu removal code path.
>>>
>>> This patch adds acpi_bus_hot_remove_device in
>>> acpi_process_hotplug_notify to
>>> fix this issue. With this patch, it works fine for us.
>>>
>>> Signed-off-by: Canquan Shen <shencanquan@...wei.com>
>>> ---
>>> drivers/acpi/processor_driver.c | 13 +------------
>>> drivers/acpi/scan.c | 4 ++--
>>> include/acpi/acpi_bus.h | 1 +
>>> 3 files changed, 4 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/acpi/processor_driver.c
>>> b/drivers/acpi/processor_driver.c
>>> index a4e0f1b..8429688 100644
>>> --- a/drivers/acpi/processor_driver.c
>>> +++ b/drivers/acpi/processor_driver.c
>>> @@ -665,18 +665,7 @@ static void
>>> acpi_processor_hotplug_notify(acpi_handle handle,
>>> case ACPI_NOTIFY_EJECT_REQUEST:
>>> ACPI_DEBUG_PRINT((ACPI_DB_INFO,
>>> "received ACPI_NOTIFY_EJECT_REQUEST\n"));
>>> -
>>> - if (acpi_bus_get_device(handle, &device)) {
>>> - printk(KERN_ERR PREFIX
>>> - "Device don't exist, dropping EJECT\n");
>>> - break;
>>> - }
>>> - pr = acpi_driver_data(device);
>>> - if (!pr) {
>>> - printk(KERN_ERR PREFIX
>>> - "Driver data is NULL, dropping EJECT\n");
>>> - return;
>>> - }
>>> + acpi_bus_hot_remove_device(handle);
>>
>> As the description in __acpi_os_execute(in acpi_os_hotplug_execute),
>> /*
>> * We can't run hotplug code in keventd_wq/kacpid_wq/kacpid_notify_wq
>> * because the hotplug code may call driver .remove() functions,
>> * which invoke flush_scheduled_work/acpi_os_wait_events_complete
>> * to flush these workqueues.
>> */
>> If so, why not using following call:
>>
>> acpi_os_hotplug_execute(acpi_bus_hot_remove_device, acpi_device->handle);
>>
>>
>
> It is ok if using the acpi_os_hotplug_execute. but it is complex and more time for removal cpu because it is add to queue and some time the work will be called.
> I think that it is clear to call directly acpi_bus_hot_remove_device function in acpi_processor_hotplug_notify.

Chen, you're right that the CPU hot-remove notifier is running in a
workqueue, we are proposing  that the notifier call
acpi_bus_hot_remove_device(), which calls the .remove() method, and
that can cause a deadlock if .remove() waits for the workqueue to be
flushed.

The usual way this deadlock happens is when the .remove() method uses
acpi_remove_notify_handler().  But the processor_driver .remove()
method doesn't do anything with notify handler registration; the
handler is registered/unregistered by the module init/exit functions.
So I don't think we need to use acpi_os_hotplug_execute() in this case
because I don't think there's a risk of deadlock.

However, if new CPU devices appear in the namespace after module-init,
I don't think processor_driver will handle them correctly.  This looks
like just another artifact of our screwed-up ACPI hotplug handling.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ