lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <494C5619.7020209@rtr.ca>
Date:	Fri, 19 Dec 2008 21:19:05 -0500
From:	Mark Lord <lkml@....ca>
To:	Pavel Machek <pavel@...e.cz>
Cc:	Linux Kernel <linux-kernel@...r.kernel.org>,
	Greg KH <gregkh@...e.de>, akpm@...ux-foundation.org,
	rjw@...k.pl, tglx@...utronix.de, lenb@...nel.org,
	linux-pm@...ts.linux-foundation.org, davej@...hat.com
Subject: Re: SMP poweroff hangs:  it's baaaack!  But on x86_64 this time.

Pavel Machek wrote:
> On Wed 2008-12-17 10:48:02, Mark Lord wrote:
>>> Subject: Fix SMP poweroff hangs
>>> From: Mark Lord <lkml@....ca>
>>>
>>> We need to disable all CPUs other than the boot CPU (usually 0) before
>>> attempting to power-off modern SMP machines.  This fixes the
>>> hang-on-poweroff issue on my MythTV SMP box, and also on Thomas Gleixner's
>>> new toybox.
>>>
>>> Signed-off-by: Mark Lord <mlord@...ox.com>
>>> Acked-by: Thomas Gleixner <tglx@...utronix.de>
>>> Cc: "Rafael J. Wysocki" <rjw@...k.pl>
>>> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
>>> ---
>>>
>>>  kernel/sys.c |    2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff -puN kernel/sys.c~fix-smp-poweroff-hangs kernel/sys.c
>>> --- a/kernel/sys.c~fix-smp-poweroff-hangs
>>> +++ a/kernel/sys.c
>>> @@ -32,6 +32,7 @@
>>>  #include <linux/getcpu.h>
>>>  #include <linux/task_io_accounting_ops.h>
>>>  #include <linux/seccomp.h>
>>> +#include <linux/cpu.h>
>>>   #include <linux/compat.h>
>>>  #include <linux/syscalls.h>
>>> @@ -878,6 +879,7 @@ void kernel_power_off(void)
>>>  	kernel_shutdown_prepare(SYSTEM_POWER_OFF);
>>>  	if (pm_power_off_prepare)
>>>  		pm_power_off_prepare();
>>> +	disable_nonboot_cpus();
>>>  	sysdev_shutdown();
>>>  	printk(KERN_EMERG "Power down.\n");
>>>  	machine_power_off();
>> ..
>>
>> This bug has returned here now, but on x86_86 this time around.
>> Same machine as before, just upgraded toa 64-bit kernel/user (2.6.27.9)
>> from the original 32-bit kernel/user that was originally fixed (above).
>>
>> One hang at poweroff over the past 10 days.  Not much, but enough
>> to destroy confidence in "unattended" operation.
>>
>> I lack opportunity to dig further into the code for now,
>> but just wanted to flag the problem, in case similar reports
>> from others might already be out there.
>>
>> In the meanwhile, I'm experimenting with this simple patch,
>> garnered from the 32-bit investigations last time around.
>> We should know in a few weeks whether it has any effect or not.
>>
>> --- old/kernel/sys.c	2008-10-18 13:57:22.000000000 -0400
>> +++ linux-2.6.27.9/kernel/sys.c	2008-12-17 09:42:17.000000000 -0500
>> @@ -303,6 +303,8 @@
>>
>> static void kernel_shutdown_prepare(enum system_states state)
>> {
>> +	set_cpus_allowed(current, cpumask_of_cpu(first_cpu(cpu_online_map)));
> 
> Is this line neccessary?
..

One or the other is probably redundant, but it's not 100% clear *which*.  :)
The line above was suggested by others back when the original problem
was being worked on for 32-bit.

>> +	disable_nonboot_cpus();
>> 	blocking_notifier_call_chain(&reboot_notifier_list,
>> 		(state == SYSTEM_HALT)?SYS_HALT:SYS_POWER_OFF, NULL);
>> 	system_state = state;
>> @@ -333,7 +335,6 @@
>> 	kernel_shutdown_prepare(SYSTEM_POWER_OFF);
>> 	if (pm_power_off_prepare)
>> 		pm_power_off_prepare();
>> -	disable_nonboot_cpus();
>> 	sysdev_shutdown();
>> 	printk(KERN_EMERG "Power down.\n");
>> 	machine_power_off();
> 
> Do you have any idea why it helps? BIOS will see us shutting down on
> cpu0 anyway, so if this helps there's a linux bug somewhere...
..

I don't know if it helps, yet.  Needs a lot more soak first.
The theory here is, that all BIOS interactions from the kernel
probably should happen only on the "boot CPU" or on "cpu 0",
not just the "halt" sequence.

By setting the CPU restriction earlier in the shutdown sequence,
we should be on the correct CPU for all of the various device-driver
BIOS ineractions that might happen before actually halting.

Could be a bad theory, but I'm open to other possible suggestions.

Back on 32-bit it did seem rather unlikely that "disable_nonboot_cpus()"
would help, but it in fact cured the problem completely there.
On two different brands of motherboards with two different BIOSs.

Cheers

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ