[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCogn0KogQ6HQJ0+XDwoT4QQFGmqfvTJmtmi65bo=zK=9w@mail.gmail.com>
Date: Wed, 22 Jan 2025 15:00:52 -0800
From: John Stultz <jstultz@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>, Frederic Weisbecker <fweisbec@...il.com>,
Andy Lutomirski <luto@...nel.org>, Borislav Petkov <bp@...e.de>, Jim Mattson <jmattson@...gle.com>,
Alex Bennée <alex.bennee@...aro.org>,
Will Deacon <will@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, LKML <linux-kernel@...r.kernel.org>,
kernel-team@...roid.com
Subject: Re: BUG: Occasional unexpected DR6 value seen with nested
virtualization on x86
On Wed, Jan 22, 2025 at 12:55 PM Sean Christopherson <seanjc@...gle.com> wrote:
> On Tue, Jan 21, 2025, John Stultz wrote:
> > Now, here's where it is odd. I could *not* reproduce the problem on
> > bare metal hardware, *nor* could I reproduce the problem in a virtual
> > environment. I can *only* reproduce the problem with nested
> > virtualization (running the VM inside a VM).
> >
> > I have reproduced this on my intel i12 NUC using the same v6.12 kernel
> > on metal + virt + nested environments. It also reproduced on the NUC
> > with v5.15 (metal) + v6.1 (virt) + v6.1(nested).
>
> Huh. This isn't actually a nested virtualization bug. It's a flaw in KVM's
> fastpath handling. But hitting it in a non-nested setup is practically impossible
> because it requires the "kernel" running the test to have interrupts enabled
> (rules out the ptrace test), a source of interrupts (rules out KVM-Unit-Test),
> window of a handful of instructions (or a weird guest).
>
...[trimmed the very impressive details I'll pretend I understand :) ]...
> But I'm leaning toward going straight for a more complete fix. My only hesitation
> is adding a dedicated .set_dr6() hook, as there's probably more code in VMX and
> SVM that can (should?) be moved out of .vcpu_run(), i.e. we could probably add a
> .pre_vcpu_run() hook to handle everything. However, even if we added a pre-run
> hook, I think I'd still prefer to keep the KVM_DEBUGREG_WONT_EXIT logic in one
> place (modulo the SVM behavior :-/).
>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/svm/svm.c | 5 ++---
> arch/x86/kvm/vmx/main.c | 1 +
> arch/x86/kvm/vmx/vmx.c | 10 ++++++----
> arch/x86/kvm/vmx/x86_ops.h | 1 +
> arch/x86/kvm/x86.c | 3 +++
> 7 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 7b4536ff3834..5459bc48cfd1 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -48,6 +48,7 @@ KVM_X86_OP(set_idt)
> KVM_X86_OP(get_gdt)
> KVM_X86_OP(set_gdt)
> KVM_X86_OP(sync_dirty_debug_regs)
> +KVM_X86_OP(set_dr6)
> KVM_X86_OP(set_dr7)
> KVM_X86_OP(cache_reg)
> KVM_X86_OP(get_rflags)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 5193c3dfbce1..21d247176858 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1685,6 +1685,7 @@ struct kvm_x86_ops {
> void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
> void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
> void (*sync_dirty_debug_regs)(struct kvm_vcpu *vcpu);
> + void (*set_dr6)(struct kvm_vcpu *vcpu, unsigned long value);
> void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
> void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
> unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7640a84e554a..9d2033d64cfb 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4247,9 +4247,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
> * Run with all-zero DR6 unless needed, so that we can get the exact cause
> * of a #DB.
> */
> - if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT))
> - svm_set_dr6(svm, vcpu->arch.dr6);
> - else
> + if (likely(!(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)))
> svm_set_dr6(svm, DR6_ACTIVE_LOW);
>
> clgi();
> @@ -5043,6 +5041,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> .set_idt = svm_set_idt,
> .get_gdt = svm_get_gdt,
> .set_gdt = svm_set_gdt,
> + .set_dr6 = svm_set_dr6,
Just fyi, to get this to build (svm_set_dr6 takes a *svm not a *vcpu)
I needed to create a little wrapper to get the types right:
static void svm_set_dr6_vcpu(struct kvm_vcpu *vcpu, unsigned long value)
{
struct vcpu_svm *svm = to_svm(vcpu);
svm_set_dr6(svm, value);
}
But otherwise, this looks like it has fixed the issue! I've not been
able to trip a failure with the bionic ptrace test, nor with the debug
test in kvm-unit-tests, both running in loops for several minutes.
Tested-by: John Stultz <jstultz@...gle.com>
Thank you so much for the fast fix here!
-john
Powered by blists - more mailing lists