linux-kernel - Re: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3435ccbf-8ca4-19ca-ca34-dbb1a551b103@citrix.com>
Date:   Sat, 12 Nov 2022 00:08:11 +0000
From:   Andrew Cooper <Andrew.Cooper3@...rix.com>
To:     "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>
CC:     "Li, Xin3" <xin3.li@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>,
        Kevin Tian <kevin.tian@...el.com>,
        Andrew Cooper <Andrew.Cooper3@...rix.com>
Subject: Re: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq()
 for NMI/IRQ reinjection

On 11/11/2022 22:22, H. Peter Anvin wrote:
> On November 11, 2022 8:35:30 AM PST, Andrew Cooper <Andrew.Cooper3@...rix.com> wrote:
>> On 11/11/2022 14:23, Peter Zijlstra wrote:
>>> On Fri, Nov 11, 2022 at 01:48:26PM +0100, Paolo Bonzini wrote:
>>>> On 11/11/22 13:19, Peter Zijlstra wrote:
>>>>> On Fri, Nov 11, 2022 at 01:04:27PM +0100, Paolo Bonzini wrote:
>>>>>> On Intel you can optionally make it hold onto IRQs, but NMIs are always
>>>>>> eaten by the VMEXIT and have to be reinjected manually.
>>>>> That 'optionally' thing worries me -- as in, KVM is currently
>>>>> opting-out?
>>>> Yes, because "If the “process posted interrupts” VM-execution control is 1,
>>>> the “acknowledge interrupt on exit” VM-exit control is 1" (SDM 26.2.1.1,
>>>> checks on VM-Execution Control Fields).  Ipse dixit.  Posted interrupts are
>>>> available and used on all processors since I think Ivy Bridge.
>> On server SKUs.  Client only got "virtual interrupt processing" fairly
>> recently IIRC, which is the CPU-side property which matters.
>>
>>> (imagine the non-coc compliant reaction here)
>>>
>>> So instead of fixing it, they made it worse :-(
>>>
>>> And now FRED is arguably making it worse again, and people wonder why I
>>> hate virt...
>> The only FRED-compatible fix is to send a self-NMI, because because you
>> may need a CSL change too.
>>
>> VT-x *does* hold the NMI latch (for VMEXIT_REASON NMI), so it's self-NMI
>> and then enable_nmi()s.
>>
>> Except the IRET to self won't work - it will need to be ERETS-to-self. 
>> Which I think is fine.
>>
>> But what isn't fine is the fact that a self-NMI doesn't deliver
>> synchronously, so you need to wait until it is pending, before enabling
>> NMIs.  (Well, actually you need to ensure that it's definitely delivered
>> before re-entering the VM).
>>
>> And I'm totally out of ideas here...
>>
>> ~Andrew
>>
> There is no fundamental reason to do a CSL/IST change if you happen to know a priori that the stack is in a valid state to have the NMI frame on it; that is:
>
> 1. Not deep into a nested I/O layer;
> 2. Valid, and not in flux in any way.

3. The NMI handler doesn't depend on being run on the alternate stack.

> Since this reinject will always be in a well-defined location, that's fine.
>
> So I think *that* concern is not actually an issue.
>
> Again, note that this is not a FRED-specific problem.

Hmm yeah.  On further consideration, I don't think FRED is relevant here
(outside of a few minor details).

The VMExit behaviour is simply that of the NMI handler but without an
exception frame on the stack.  The early asm is walking on egg-shells
with respect to the NMI latch, just like the regular NMI handler is.


Peter is correct that once you leave the VMExit handler's noinstr
region, a plethora of things can re-enable NMIs behind your back.  And
this happening in practice will end up with you logically taking NMIs
out of order.

Whether this matters or not is a different question.  Right now, NMI is
"just" an edge triggered interrupt, but a theoretical future with NMI
vectors might have some fun causality bugs to contend with.


If the out-of-order NMIs isn't a major concern, then a self-NMI is the
simple way to invoke the NMI handler in a context it can cope with. 
Otherwise, the VMExit handler's instr region has to do the handoff when
it's in the same state that the NMI handler is expecting.

~Andrew