linux-kernel - Re: recent x86-64 nested NMI adjustments

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1331558201.25686.629.camel@gandalf.stny.rr.com>
Date:	Mon, 12 Mar 2012 09:16:41 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Jan Beulich <JBeulich@...e.com>
Cc:	mingo@...e.hu, Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, hpa@...or.com
Subject: Re: recent x86-64 nested NMI adjustments

On Mon, 2012-03-12 at 12:10 +0000, Jan Beulich wrote:
> Hi Steven,
> 
> the explanation of 45d5a1683c04be28abdf5c04c27b1417e0374486
> seems bogus to me: When arriving from user mode, %rsp won't point
> to the user stack anymore, as it gets switched away from during the
> processing of the exception (the more that the IDT entry specifies a
> separate stack anyway, which even guarantees this for kernel mode
> entries).

No it is real, and I had a test program that exploited it. I'm not
worried about the current %rsp, I'm worried about what %rsp is saved on
the stack. Two things are used to check if the incoming NMI is nested or
not.

1) if the on-stack "in-nmi" variable is set

2) if the saved %rsp is pointing to the NMI stack.

Note, #2 looks at the *saved* %rsp. Which is the %rsp at the time the
NMI triggered. The second check is used to handle the case that a nested
NMI came in after the previous NMI cleared the on-stack "in-nmi"
variable, but before it calls the iret.

There are few cases that the stack can change in the NMI so the variable
is also used.

There's a really good article on LWN about this :-)

https://lwn.net/Articles/484932/ 

 (subscription required, but you should have one)

That said, I added a printk into the boot up to show me where the NMI
stacks were located. Then I wrote a program that would pin itself to a
CPU and change its stack pointer to point into the NMI stack of that CPU
and then go into an infinite loop. I ran perf on this code and it became
"invisible" to perf. That is, every time the NMI came in while this code
was running, it incorrectly considered itself a nested NMI and returned,
never recording the presence of this program.

After adding this patch, perf shows the task spending 99.9% of the time
in this loop. Thus this is a real bug.

> 
> Further, a38449ef596b345e13a8f9b7d5cd9fedb8fcf921 makes the
> (presumably superfluous) compare a 4-byte one, while the
> documentation isn't really stating that selectors get pushed zero-
> extended. Hence, if not reverting the first change altogether, I'd
> minimally recommend converting the compare to a 2-byte one.

I'll let H. Peter answer this one, he's the Intel representative here.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/