[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200427173108.GI14870@linux.intel.com>
Date: Mon, 27 Apr 2020 10:31:09 -0700
From: Sean Christopherson <sean.j.christopherson@...el.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: VMX: Use accessor to read vmcs.INTR_INFO when
handling exception
On Mon, Apr 27, 2020 at 10:18:37AM -0700, Sean Christopherson wrote:
> Use vmx_get_intr_info() when grabbing the cached vmcs.INTR_INFO in
> handle_exception_nmi() to ensure the cache isn't stale. Bypassing the
> caching accessor doesn't cause any known issues as the cache is always
> refreshed by handle_exception_nmi_irqoff(), but the whole point of
> adding the proper caching mechanism was to avoid such dependencies.
Despite stating that this doesn't cause any known issues, the reason I
ended up looking at this code is because I hit an emulation error due to a
presumed page fault getting intercepted while EPT is enabled, i.e. I hit
this warning:
if (is_page_fault(intr_info)) {
cr2 = vmx_get_exit_qual(vcpu);
/* EPT won't cause page fault directly */
WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason && enable_ept);
return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
}
The problem is that I hit the WARN while running KVM unit tests in L2, with
the "buggy" KVM in L1, and a slightly older version of kvm/queue running as
L0. I.e. the bug could easily be incorrect #PF reflection/injection in L0.
To make matters worse, I stupidly didn't capture any state at the time
of failure because I assumed the failure would be reproducible, e.g. I
don't know if L2 (L1 from this patch's perspective) or L3 (relative L2) was
active.
And because things weren't complicated enough, I'm not even sure what KVM
configuration I was running as L2 (relative L1). I know what commit I was
running, but I may or may not have been running with ept=0, and it may or
may not have been a 32-bit kernel. *sigh*
I've been poring over the caching code and the nested code trying to figure
out what might have gone wrong, but haven't been able to find a smoking gun.
TL;DR: I don't think this causes bugs, but I hit a non-reproducible WARN
that is very much related to the code in question.
> Fixes: 8791585837f6 ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@...el.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 3ab6ca6062ce..7bddcb24f6f3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4677,7 +4677,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> u32 vect_info;
>
> vect_info = vmx->idt_vectoring_info;
> - intr_info = vmx->exit_intr_info;
> + intr_info = vmx_get_intr_info(vcpu);
>
> if (is_machine_check(intr_info) || is_nmi(intr_info))
> return 1; /* handled by handle_exception_nmi_irqoff() */
> --
> 2.26.0
>
Powered by blists - more mailing lists