[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTildediWa_vkY76mLWoAjgdaEgIbWP0DsKu3Ujap@mail.gmail.com>
Date: Sun, 18 Jul 2010 11:17:32 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Avi Kivity <avi@...hat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Steven Rostedt <rostedt@...tedt.homelinux.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
Christoph Hellwig <hch@....de>, Li Zefan <lizf@...fujitsu.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Johannes Berg <johannes.berg@...el.com>,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
Tom Zanussi <tzanussi@...il.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Andi Kleen <andi@...stfloor.org>,
"H. Peter Anvin" <hpa@...or.com>,
Jeremy Fitzhardinge <jeremy@...p.org>,
"Frank Ch. Eigler" <fche@...hat.com>, Tejun Heo <htejun@...il.com>
Subject: Re: [patch 1/2] x86_64 page fault NMI-safe
On Sun, Jul 18, 2010 at 10:36 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Lookie here, the %rsp comparison really isn't that hard:
A few notes on that (still) untested code suggestion:
> nmi:
> pushq %rax
> pushq %rdx
> movq %rsp,%rdx # current stack top
> movq 40(%rsp),%rax # old stack top
> xor %rax,%rdx # same 8kB aligned area?
> shrq $13,%rdx # ignore low 13 bits
> je it_is_a_nested_nmi # looks nested..
> non_nested:
> ...
> ... ok, we're not nested, do normal NMI handling ...
> ...
The non_nested case still needs to start off with moving it's stack
frame to a safe area that won't be overwritten by any nesting NMI's
(note that they cannot nest at this point, since we've done nothing
that can fault). So we'd still need that
7* pushq 48(%rsp)
which copies the five words that got pushed by hardware, and the two
register-save locations that we used for the nesting check and special
return.
After we've done those 7 pushes, we can then run code that may take a
fault. Because when the fault returns with an "iret" and re-enables
NMI's, our nesting code is ready.
So all told, we need a maximum of about 216 bytes of stack for the
nested NMI case: 56 bytes for the seven copied words, and the 160
bytes that we build up _under_ the stack pointer for the nested case.
And we need the NMI stack itself to be aligned in order for that
"ignore low bits" check to work. Although we don't actually have to do
that "xor+shr", we could do the test equally well with a "sub+unsigned
compare against stack size".
Other than that, I think the extra test that we're really nested might
better be done differently:
> it_is_a_nested_nmi:
> cmpw $0,48(%rsp) # double-check that it really was a nested exception
> jne non_nested # from user space or something..
> # this is the nested case
It migth be safer to check the saved CS rather than the saved SS on
the stack to see that we really are in kernel mode. It's possible that
somebody could load a NULL SS in user mode and then just not use the
stack - and try to make it look like they are in kernel mode for when
the NMI happens. Now, I _think_ that loading a zero SS is supposed to
trap, but checking CS is still likely to be the better test for "were
we in kernel mode". That's where the CPL is really encoded, after all.
So that "cmpw $0,48(%rsp)" is probably ok, but it would likely be
better to do it as
testl $3,24(%rsp)
jne non_nested
instead. That's what entry_64.S does everywhere else.
Oh, and the non-nested case obviously needs all the regular "make the
kernel state look right" code. Like the swapgs stuff etc if required.
My example code was really meant to just document the nesting
handling, not the existing stuff we already need to do with
save_paranoid etc.
And I really think it should work, but I'd again like to stress that
it's just a RFD code sequence with no testing what-so-ever etc.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists