[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXsxJ6eotRJG1tjPD8dkpAd6P2roDm+fj+Yqi36uh3WHA@mail.gmail.com>
Date: Mon, 17 Nov 2014 16:55:27 -0800
From: Andy Lutomirski <luto@...capital.net>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Borislav Petkov <bp@...en8.de>, Andi Kleen <andi@...stfloor.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
X86 ML <x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace
On Mon, Nov 17, 2014 at 4:22 PM, Luck, Tony <tony.luck@...el.com> wrote:
>> It could also be interesting to tweak mce_panic to not actually panic
>> the machine but to try to return and stop the test instead. Then real
>> debugging could be possible :)
>
> The lost cpu is *really* lost. Warm reset doesn't fix the machine, I usually
> have to do a full power cycle.
How is it even possible that I did that with a few lines of asm?
Could this be a hardware bug? Is there some condition that causes #MC
delivery to wedge hard enough that even INIT/RESET stops working? Or
possibly some CPU got stuck in SMM -- I have no idea what warm reset
does these days.
My initial attempts to test machine_check in KVM using IPIs are having
some issues, probably because I'm not acking the interrupt. I can do
it once, but then it stops working.
Here's the patch to improve the timeout messages, but given the degree
of wedgedness, I can guess what it'll say:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/paranoid&id=e5cbd9d141bde651ecb20f0b65ad13bcef2468d0
--Andy
>
> -Tony
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists