linux-kernel - Re: [RFC 0/9] mce recovery for Sandy Bridge server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BANLkTi=Mz7SoXz5zk-p-+FYBKC6aSWtgtg@mail.gmail.com>
Date:	Tue, 24 May 2011 14:48:30 -0700
From:	Tony Luck <tony.luck@...il.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Borislav Petkov <bp@...64.org>, Ingo Molnar <mingo@...e.hu>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Huang, Ying" <ying.huang@...el.com>,
	Andi Kleen <andi@...stfloor.org>,
	Borislav Petkov <bp@...en8.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mauro Carvalho Chehab <mchehab@...hat.com>
Subject: Re: [RFC 0/9] mce recovery for Sandy Bridge server

On Tue, May 24, 2011 at 2:30 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Tue, May 24, 2011 at 2:24 PM, Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
>>
>> Right, so you can't do things like that from NMI context, but what perf
>> can do is raise a self-IPI and continue from IRQ context (question for
>> the HW folks, can there be cycles between the NMI iret and IRQ assert
>> from whatever context was before the NMI hit?)
>
> Of course there can be - the code where the NMI hit may have
> interrupts disabled.

But the case when I'd want to do the "stop this task" thing is when I
think that I can recover - for memory errors detected while in kernel
code I expect this will only ever be a few special cases:
1) copy to/from user
2) copy page (for copy-on-write fault)
3) ...
and in these cases we don't have interrupts disabled.  In fact I have
difficulty imagining a scenario where the kernel trips over a memory
error in interrupt disabled code that would ever be recoverable.

So my NMI handler can look at the saved pt_regs to see whether
it blasted its way into some interrupt disabled code and call that
fatal - if it came in while interrupts were enabled, then it could use
Peter's self-IPI thingy.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/