linux-kernel - Re: [PATCH V2] x86/entry/64: De-Xen-ify our NMI code further

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210125125236.14b295a5@gandalf.local.home>
Date:   Mon, 25 Jan 2021 12:52:36 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Lai Jiangshan <laijs@...ux.alibaba.com>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH V2] x86/entry/64: De-Xen-ify our NMI code further

On Mon, 25 Jan 2021 12:38:59 -0500
Steven Rostedt <rostedt@...dmis.org> wrote:

> On triggering an NMI from user space, I see the switch to the thread stack
> is done, and "exc_nmi" is called.
> 
> The problem I see with this is that exc_nmi is called with the thread
> stack, if it were to take an exception, NMIs would be enabled allowing for
> a nested NMI to run. From what I can tell, I don't see anything stopping
> that NMI from executing over the currently running NMI. That is, this means
> that NMI handlers are now re-entrant.
> 
> Yes, the stack issue is not a problem here, but NMI handlers are not
> allowed to be re-entrant. For example, we have spin locks in NMI handlers
> that are considered fine if they are only used in NMI handlers. But because
> there's a possible way to make NMI handlers re-entrant then these spin
> locks can deadlock.
> 
> I'm guessing that we need to add some tricks to the user space path to
> set and clear the "NMI executing" variable, but the return may become a bit
> complex in clearing that without races.

I think this may work if we wrap the exc_nmi call with the following:

Overwrite the NMI HW stack frame on the NMI stack as if an NMI came in at
the return back to the user space path of the NMI handler. Set the stack
pointer to the NMI stack just after the first frame that was updated. Then
jump to asm_exc_nmi.

Then the code would act like it came in from kernel mode, and execute the
NMI nesting code normally. When it finishes, and does the iretq, it will
return to the NMI handler for the user space return with the kernel thread
stack, and then the special code for returning to user space can be called.

The exc_nmi C code will need to handle this case to update pt_regs to make
sure the registered NMI handlers still see the pt_regs from user space. But
I think something like this may be the easiest way to handle this without
dealing with more NMI stack nesting races.

I could try to write something up to implemented this.

-- Steve