linux-kernel - Re: [PATCH v2] x86/kvm: Disable KVM_ASYNC_PF_SEND

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5e79facb-292d-eeae-b860-81a0bee9ef4c@citrix.com>
Date:   Thu, 9 Apr 2020 12:36:27 +0100
From:   Andrew Cooper <andrew.cooper3@...rix.com>
To:     Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>
CC:     Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Vivek Goyal <vgoyal@...hat.com>,
        "Peter Zijlstra" <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        kvm list <kvm@...r.kernel.org>, stable <stable@...r.kernel.org>
Subject: Re: [PATCH v2] x86/kvm: Disable KVM_ASYNC_PF_SEND_ALWAYS

On 09/04/2020 05:50, Andy Lutomirski wrote:
> On Wed, Apr 8, 2020 at 11:01 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>> Paolo Bonzini <pbonzini@...hat.com> writes:
>>> On 08/04/20 17:34, Sean Christopherson wrote:
>>>> On Wed, Apr 08, 2020 at 10:23:58AM +0200, Paolo Bonzini wrote:
>>>>> Page-not-present async page faults are almost a perfect match for the
>>>>> hardware use of #VE (and it might even be possible to let the processor
>>>>> deliver the exceptions).
>>>> My "async" page fault knowledge is limited, but if the desired behavior is
>>>> to reflect a fault into the guest for select EPT Violations, then yes,
>>>> enabling EPT Violation #VEs in hardware is doable.  The big gotcha is that
>>>> KVM needs to set the suppress #VE bit for all EPTEs when allocating a new
>>>> MMU page, otherwise not-present faults on zero-initialized EPTEs will get
>>>> reflected.
>>>>
>>>> Attached a patch that does the prep work in the MMU.  The VMX usage would be:
>>>>
>>>>      kvm_mmu_set_spte_init_value(VMX_EPT_SUPPRESS_VE_BIT);
>>>>
>>>> when EPT Violation #VEs are enabled.  It's 64-bit only as it uses stosq to
>>>> initialize EPTEs.  32-bit could also be supported by doing memcpy() from
>>>> a static page.
>>> The complication is that (at least according to the current ABI) we
>>> would not want #VE to kick if the guest currently has IF=0 (and possibly
>>> CPL=0).  But the ABI is not set in stone, and anyway the #VE protocol is
>>> a decent one and worth using as a base for whatever PV protocol we design.
>> Forget the current pf async semantics (or the lack of). You really want
>> to start from scratch and igore the whole thing.
>>
>> The charm of #VE is that the hardware can inject it and it's not nesting
>> until the guest cleared the second word in the VE information area. If
>> that word is not 0 then you get a regular vmexit where you suspend the
>> vcpu until the nested problem is solved.
> Can you point me at where the SDM says this?

Vol3 25.5.6.1 Convertible EPT Violations

> Anyway, I see two problems with #VE, one big and one small.  The small
> (or maybe small) one is that any fancy protocol where the guest
> returns from an exception by doing, logically:
>
> Hey I'm done;  /* MOV somewhere, hypercall, MOV to CR4, whatever */
> IRET;
>
> is fundamentally racy.  After we say we're done and before IRET, we
> can be recursively reentered.  Hi, NMI!

Correct.  There is no way to atomically end the #VE handler.  (This
causes "fun" even when using #VE for its intended purpose.)

~Andrew