[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1331558201.25686.629.camel@gandalf.stny.rr.com>
Date: Mon, 12 Mar 2012 09:16:41 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Jan Beulich <JBeulich@...e.com>
Cc: mingo@...e.hu, Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, hpa@...or.com
Subject: Re: recent x86-64 nested NMI adjustments
On Mon, 2012-03-12 at 12:10 +0000, Jan Beulich wrote:
> Hi Steven,
>
> the explanation of 45d5a1683c04be28abdf5c04c27b1417e0374486
> seems bogus to me: When arriving from user mode, %rsp won't point
> to the user stack anymore, as it gets switched away from during the
> processing of the exception (the more that the IDT entry specifies a
> separate stack anyway, which even guarantees this for kernel mode
> entries).
No it is real, and I had a test program that exploited it. I'm not
worried about the current %rsp, I'm worried about what %rsp is saved on
the stack. Two things are used to check if the incoming NMI is nested or
not.
1) if the on-stack "in-nmi" variable is set
2) if the saved %rsp is pointing to the NMI stack.
Note, #2 looks at the *saved* %rsp. Which is the %rsp at the time the
NMI triggered. The second check is used to handle the case that a nested
NMI came in after the previous NMI cleared the on-stack "in-nmi"
variable, but before it calls the iret.
There are few cases that the stack can change in the NMI so the variable
is also used.
There's a really good article on LWN about this :-)
https://lwn.net/Articles/484932/
(subscription required, but you should have one)
That said, I added a printk into the boot up to show me where the NMI
stacks were located. Then I wrote a program that would pin itself to a
CPU and change its stack pointer to point into the NMI stack of that CPU
and then go into an infinite loop. I ran perf on this code and it became
"invisible" to perf. That is, every time the NMI came in while this code
was running, it incorrectly considered itself a nested NMI and returned,
never recording the presence of this program.
After adding this patch, perf shows the task spending 99.9% of the time
in this loop. Thus this is a real bug.
>
> Further, a38449ef596b345e13a8f9b7d5cd9fedb8fcf921 makes the
> (presumably superfluous) compare a 4-byte one, while the
> documentation isn't really stating that selectors get pushed zero-
> extended. Hence, if not reverting the first change altogether, I'd
> minimally recommend converting the compare to a 2-byte one.
I'll let H. Peter answer this one, he's the Intel representative here.
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists