[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrW1qP=vbHCSdgOLjjP+-i=io3o1w5bMdtH_UHSV3gvBXg@mail.gmail.com>
Date: Sun, 24 Jan 2021 19:00:46 -0800
From: Andy Lutomirski <luto@...nel.org>
To: Lai Jiangshan <jiangshanlai@...il.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Peter Zijlstra <peterz@...radead.org>,
Lai Jiangshan <laijs@...ux.alibaba.com>,
Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH] x86/entry/64: De-Xen-ify our NMI code further
On Sun, Jan 24, 2021 at 5:13 PM Lai Jiangshan <jiangshanlai@...il.com> wrote:
>
> From: Lai Jiangshan <laijs@...ux.alibaba.com>
>
> The commit 929bacec21478("x86/entry/64: De-Xen-ify our NMI code") simplified
> the NMI code by changing paravirt code into native code and left a comment
> about "inspecting RIP instead". But until now, "inspecting RIP instead"
> has not been made happened and this patch tries to complete it.
>
> Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>
> ---
> arch/x86/entry/entry_64.S | 46 +++++++++++----------------------------
> 1 file changed, 13 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index cad08703c4ad..cb6b8a6c6652 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1268,32 +1268,12 @@ SYM_CODE_START(asm_exc_nmi)
> je nested_nmi
>
> /*
> - * Now test if the previous stack was an NMI stack. This covers
> - * the case where we interrupt an outer NMI after it clears
> - * "NMI executing" but before IRET. We need to be careful, though:
> - * there is one case in which RSP could point to the NMI stack
> - * despite there being no NMI active: naughty userspace controls
> - * RSP at the very beginning of the SYSCALL targets. We can
> - * pull a fast one on naughty userspace, though: we program
> - * SYSCALL to mask DF, so userspace cannot cause DF to be set
> - * if it controls the kernel's RSP. We set DF before we clear
> - * "NMI executing".
> + * Now test if we interrupt an outer NMI after it clears
> + * "NMI executing" but before iret.
s/interrupt/interrupted
But let's make it a lot more clear:
Now test if we interrupted an outer NMI that just cleared "NMI
executing" and is about to IRET. This is a single-instruction window.
This check does not handle the case in which we get a nested interrupt
(#MC, #VE, #VC, etc.) after clearing "NMI executing" but before the
outer NMI executes IRET.
> + movq $nmi_executing_cleared, %rdx
> + cmpq 8(%rsp), %rdx
> + jne first_nmi
If we're okay with non-PIC code, then this is suboptimal -- you can
just compare directly. But using PIC is polite, so that movq should
be a RIP-relative leaq.
>
> /* This is a nested NMI. */
>
> @@ -1438,16 +1418,16 @@ nmi_restore:
> addq $6*8, %rsp
>
> /*
> - * Clear "NMI executing". Set DF first so that we can easily
> - * distinguish the remaining code between here and IRET from
> - * the SYSCALL entry and exit paths.
> - *
> - * We arguably should just inspect RIP instead, but I (Andy) wrote
> - * this code when I had the misapprehension that Xen PV supported
> - * NMIs, and Xen PV would break that approach.
> + * Clear "NMI executing". It also leaves a window after it before
> + * iret which should be also considered to be "NMI executing" albeit
> + * with "NMI executing" variable being zero. So we should also check
> + * the RIP after it when checking "NMI executing". See the code
> + * before nested_nmi. No code is allowed to be added to between
> + * clearing "NMI executing" and iret unless we check a larger window
> + * with a range of RIPs instead of currently a single-RIP window.
Let's simplify this comment:
Clear "NMI executing". This leaves a window in which a nested NMI
could observe "NMI executing" cleared, and a nested NMI will detect
this by inspecting RIP.
> */
> - std
> movq $0, 5*8(%rsp) /* clear "NMI executing" */
> +nmi_executing_cleared:
>
This should be local. Let's call it .Lnmi_iret. And add a comment:
.Lnmi_iret: /* must be immediately after clearing "NMI executing" */
> /*
> * iretq reads the "iret" frame and exits the NMI stack in a
Powered by blists - more mailing lists