linux-kernel - Re: [PATCH v2 06/25] KVM: nVMX/nSVM: do not monkey-patch inject_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YieOvca6qbCDgrMl@google.com>
Date:   Tue, 8 Mar 2022 17:13:33 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        dmatlack@...gle.com
Subject: Re: [PATCH v2 06/25] KVM: nVMX/nSVM: do not monkey-patch
 inject_page_fault callback

On Mon, Feb 21, 2022, Paolo Bonzini wrote:
> Currently, vendor code is patching the inject_page_fault and later, on
> vmexit, expecting kvm_init_mmu to restore the inject_page_fault callback.
> 
> This is brittle, as exposed by the fact that SVM KVM_SET_NESTED_STATE
> forgets to do it.  Instead, do the check at the time a page fault actually
> has to be injected.  This does incur the cost of an extra retpoline
> for nested vmexits when TDP is disabled, but is overall much cleaner.
> While at it, add a comment that explains why the different behavior
> is needed in this case.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
> ---

If I have NAK powers, NAK NAK NAK NAK NAK :-)

Forcing a VM-Exit is a hack, e.g. it's the entire reason inject_emulated_exception()
returns a bool.  Even worse, it's confusing and misleading due to being incomplete.

The need hack for the hack is not unique to !tdp_enabled, the #DF can be triggered
any time L0 is intercepting #PF.  Hello, allow_smaller_maxphyaddr.

And while I think allow_smaller_maxphyaddr should be burned with fire, architecturally
it's still incomplete.  Any exception that is injected by KVM needs to be subjected
to nested interception checks, not just #PF.  E.g. a #GP while vectoring a different
fault should also be routed to L1.  KVM (mostly) gets away with special casing #PF
because that's the only common scenario where L1 wants to intercept _and fix_ a fault
that can occur while vectoring an exception.  E.g. in the #GP => #DF case, odds are
very good that L1 will inject a #DF too, but that doesn't make KVM's behavior correct.

I have a series to handle this by performing the interception checks when an exception
is queued, instead of when KVM injects the excepiton, and using a second kvm_queued_exception
field to track exceptions that are queued for VM-Exit (so as not to lose the injected
exception, which needs to be saved into vmc*12.  It's functional, though I haven't
tested migration (requires minor shenanigans to perform interception checks for pending
exceptions coming in from userspace).