lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 24 Jan 2021 19:00:46 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Lai Jiangshan <laijs@...ux.alibaba.com>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH] x86/entry/64: De-Xen-ify our NMI code further

On Sun, Jan 24, 2021 at 5:13 PM Lai Jiangshan <jiangshanlai@...il.com> wrote:
>
> From: Lai Jiangshan <laijs@...ux.alibaba.com>
>
> The commit 929bacec21478("x86/entry/64: De-Xen-ify our NMI code") simplified
> the NMI code by changing paravirt code into native code and left a comment
> about "inspecting RIP instead".  But until now, "inspecting RIP instead"
> has not been made happened and this patch tries to complete it.
>
> Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>
> ---
>  arch/x86/entry/entry_64.S | 46 +++++++++++----------------------------
>  1 file changed, 13 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index cad08703c4ad..cb6b8a6c6652 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1268,32 +1268,12 @@ SYM_CODE_START(asm_exc_nmi)
>         je      nested_nmi
>
>         /*
> -        * Now test if the previous stack was an NMI stack.  This covers
> -        * the case where we interrupt an outer NMI after it clears
> -        * "NMI executing" but before IRET.  We need to be careful, though:
> -        * there is one case in which RSP could point to the NMI stack
> -        * despite there being no NMI active: naughty userspace controls
> -        * RSP at the very beginning of the SYSCALL targets.  We can
> -        * pull a fast one on naughty userspace, though: we program
> -        * SYSCALL to mask DF, so userspace cannot cause DF to be set
> -        * if it controls the kernel's RSP.  We set DF before we clear
> -        * "NMI executing".
> +        * Now test if we interrupt an outer NMI after it clears
> +        * "NMI executing" but before iret.

s/interrupt/interrupted

But let's make it a lot more clear:


Now test if we interrupted an outer NMI that just cleared "NMI
executing" and is about to IRET.  This is a single-instruction window.
This check does not handle the case in which we get a nested interrupt
(#MC, #VE, #VC, etc.) after clearing "NMI executing" but before the
outer NMI executes IRET.

> +       movq    $nmi_executing_cleared, %rdx
> +       cmpq    8(%rsp), %rdx
> +       jne     first_nmi

If we're okay with non-PIC code, then this is suboptimal -- you can
just compare directly.  But using PIC is polite, so that movq should
be a RIP-relative leaq.

>
>         /* This is a nested NMI. */
>
> @@ -1438,16 +1418,16 @@ nmi_restore:
>         addq    $6*8, %rsp
>
>         /*
> -        * Clear "NMI executing".  Set DF first so that we can easily
> -        * distinguish the remaining code between here and IRET from
> -        * the SYSCALL entry and exit paths.
> -        *
> -        * We arguably should just inspect RIP instead, but I (Andy) wrote
> -        * this code when I had the misapprehension that Xen PV supported
> -        * NMIs, and Xen PV would break that approach.
> +        * Clear "NMI executing".  It also leaves a window after it before
> +        * iret which should be also considered to be "NMI executing" albeit
> +        * with "NMI executing" variable being zero.  So we should also check
> +        * the RIP after it when checking "NMI executing".  See the code
> +        * before nested_nmi.  No code is allowed to be added to between
> +        * clearing "NMI executing" and iret unless we check a larger window
> +        * with a range of RIPs instead of currently a single-RIP window.

Let's simplify this comment:

Clear "NMI executing".  This leaves a window in which a nested NMI
could observe "NMI executing" cleared, and a nested NMI will detect
this by inspecting RIP.

>          */
> -       std
>         movq    $0, 5*8(%rsp)           /* clear "NMI executing" */
> +nmi_executing_cleared:
>

This should be local.  Let's call it .Lnmi_iret.  And add a comment:

.Lnmi_iret: /* must be immediately after clearing "NMI executing" */

>         /*
>          * iretq reads the "iret" frame and exits the NMI stack in a

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ