linux-kernel - Re: [PATCH] KVM: x86: Exit to userspace when kvm_check_nested

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YQGZpPUC5TViIRih@google.com>
Date:   Wed, 28 Jul 2021 17:53:40 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Vitaly Kuznetsov <vkuznets@...hat.com>,
        Jim Mattson <jmattson@...gle.com>,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH] KVM: x86: Exit to userspace when kvm_check_nested_events
 fails

On Wed, Jul 28, 2021, Paolo Bonzini wrote:
> On 28/07/21 14:39, Vitaly Kuznetsov wrote:
> > Shouldn't we also change kvm_arch_vcpu_runnable() and check
> > 'kvm_vcpu_running() > 0' now?
> 
> I think leaving kvm_vcpu_block on error is the better choice, so it should
> be good with returning true if kvm_vcpu_running(vcpu) < 0.

Blech.  This is all gross.  There is a subtle bug lurking in both Jim's approach
and in this approach.  It's not detected because the selftest exercises a bad PI
descriptor, not a bad vAPIC page.

In Jim's approach of returning 'true' from kvm_vcpu_running() if
kvm_check_nested_events() fails due to vmx_complete_nested_posted_interrupt()
detecting a bad vAPIC page, the resulting KVM_EXIT_INTERNAL_ERROR will be "lost"
due to vmx->nested.pi_pending being cleared.  KVM runs the vCPU, but skips over
the PI check in inject_pending_event() due to vmx->nested.pi_pending==false.
The selftest works because the bad PI descriptor case is handled _before_
pi_pending is cleared.

This approach mostly fixes that bug by virtue of returning immediately in the
vcpu_run() case, but if the bad vAPIC page is encountered via
kvm_arch_vcpu_runnable(), KVM will effectively drop the error.  This can be
hack-a-fixed by pre-checking the vAPIC page.  That's arguably architecturally
wrong as the vAPIC emulation access shouldn't occur until after PI.ON is cleared,
but from KVM's perspective I think it's the least awful "fix" given the current
train wreck.

Alternatively, what about punting all of this in favor of targeting the full
cleanup[*] for 5.15?  I believe I have the bandwidth to pick that up.

[*] https://lkml.kernel.org/r/YKWI1GPdNc4shaCt@google.com

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0d0dd6580cfd..8d1c8217954a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3707,6 +3707,10 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
        if (!vmx->nested.pi_desc)
                goto mmio_needed;

+       vapic_page = vmx->nested.virtual_apic_map.hva;
+       if (!vapic_page)
+               goto mmio_needed;
+
        vmx->nested.pi_pending = false;

        if (!pi_test_and_clear_on(vmx->nested.pi_desc))
@@ -3714,10 +3718,6 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)

        max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256);
        if (max_irr != 256) {
-               vapic_page = vmx->nested.virtual_apic_map.hva;
-               if (!vapic_page)
-                       goto mmio_needed;
-
                __kvm_apic_update_irr(vmx->nested.pi_desc->pir,
                        vapic_page, &max_irr);
                status = vmcs_read16(GUEST_INTR_STATUS);

[*]