[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1177505556.19745.16.camel@sebastian.intellilink.co.jp>
Date: Wed, 25 Apr 2007 21:52:36 +0900
From: Fernando Luis Vázquez Cao
<fernando@....ntt.co.jp>
To: Andi Kleen <ak@...e.de>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>, horms@...ge.net.au,
kexec@...ts.infradead.org, linux-kernel@...r.kernel.org,
vgoyal@...ibm.com, mbligh@...gle.com,
Keith Owens <kaos@....com.au>, akpm@...ux-foundation.org
Subject: Re: [PATCH 10/10] Use safe_apic_wait_icr_idle in
__send_IPI_dest_field - x86_64
On Wed, 2007-04-25 at 14:33 +0200, Andi Kleen wrote:
> On Wednesday 25 April 2007 13:51:12 Fernando Luis Vázquez Cao wrote:
> > Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
> > NMI_VECTOR to avoid potential hangups in the event of crash when kdump
> > tries to stop the other CPUs.
>
> But what happens then when this fails? Won't this give another hang?
> Have you tested this?
In kdump the crashing CPU (i.e. the CPU that called crash_kexec) is the
one in charge of rebooting into and executing the dump capture kernel.
But before doing this it attempts to stop the other CPUs sending a IPI
using NMI_VECTOR as the vector. The problem is that sometimes delivery
seems to fail and the crashing CPU gets stuck waiting for the ICR status
bit to be cleared, which will never happen.
With this patch, when safe_apic_wait_icr_idle times out the CPU will
continue executing and try to hand over control to the dump capture
kernel as usual. After applying this patch I have not seen hangs in the
reboot path to second kernel showing the symptoms mentioned before, but
perhaps I am just being lucky and there is something else going on.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists