[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <480F3CBC.60305@rtr.ca>
Date: Wed, 23 Apr 2008 09:42:20 -0400
From: Mark Lord <lkml@....ca>
To: Jens Axboe <jens.axboe@...cle.com>
Cc: linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
npiggin@...e.de, torvalds@...ux-foundation.org
Subject: Re: [PATCH 1/11] Add generic helpers for arch IPI function calls
>>> Date: Thu, 15 Nov 2007 12:07:48 -0500
>>> From: Mark Lord <lkml@....ca>
>>> To: Greg KH <gregkh@...e.de>
>>> Cc: Yasunori Goto <y-goto@...fujitsu.com>,
>>> Andrew Morton <akpm@...ux-foundation.org>,
>>> Alexey Dobriyan <adobriyan@...ru>, linux-kernel@...r.kernel.org
>>> Subject: Re: EIP is at device_shutdown+0x32/0x60
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>> Content-Transfer-Encoding: 7bit
>>> Sender: linux-kernel-owner@...r.kernel.org
>>>
>>> ... < snip > ...
>>>
>>> Greg, I don't know if this is relevant or not,
>>> but x86 has bugs in the halt/reboot code for SMP.
>>>
>>> Specifically, in native_smp_send_stop() the code now uses
>>> spin_trylock() to "lock" the shared call buffers,
>>> but then ignores the result.
>>>
>>> This means that multiple CPUs can/will clobber each other
>>> in that code.
>>>
>>> The second bug, is that this code does not wait for the
>>> target CPUs to actually stop before it continues.
>>>
>>> This was the real cause of the failure-to-poweroff problems
>>> I was having with 2.6.23, which we fixed by using CPU hotplug
>>> to disable_nonboot_cpus() before the above code ever got run.
Jens Axboe wrote:
> On Tue, Apr 22 2008, Mark Lord wrote:
>> Jens,
>>
>> While you're in there, :)
>>
>> Could you perhaps fix this bug (above) if it still exists?
>
> I don't understand the bug - what are the shared call buffers you are
> talking of?
>
> With the changes, there's not even an spin_trylock() in there anymore.
> But I don't see the original bug either, so...
..
arch/x86/kernel/smp.c:
static void native_smp_send_stop(void)
{
int nolock;
unsigned long flags;
if (reboot_force)
return;
/* Don't deadlock on the call lock in panic */
nolock = !spin_trylock(&call_lock); <<<<<<<<<< buggy
local_irq_save(flags);
__smp_call_function(stop_this_cpu, NULL, 0, 0);
if (!nolock)
spin_unlock(&call_lock);
disable_local_APIC();
local_irq_restore(flags);
}
The spinlock is trying to protect access to the global variable
"call_data" (higher up in the same file), which is used
in __smp_call_function() and friends.
But since the spinlock is ignored in this case,
the global "call_data will get clobbered if it was already in-use.
The second bug, is that for the halt case at least,
nobody waits for the other CPU to actually halt
before continuing.. so we sometimes enter the shutdown
code while other CPUs are still active.
This causes some machines to hang at shutdown,
unless CPU_HOTPLUG is configured and takes them offline
before we get here.
Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists