[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z5GOFVFO6ocd1sli@google.com>
Date: Wed, 22 Jan 2025 16:32:21 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: John Stultz <jstultz@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>, Frederic Weisbecker <fweisbec@...il.com>,
Andy Lutomirski <luto@...nel.org>, Borislav Petkov <bp@...e.de>, Jim Mattson <jmattson@...gle.com>,
"Alex Bennée" <alex.bennee@...aro.org>, Will Deacon <will@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>, kernel-team@...roid.com
Subject: Re: BUG: Occasional unexpected DR6 value seen with nested
virtualization on x86
On Wed, Jan 22, 2025, John Stultz wrote:
> On Wed, Jan 22, 2025 at 12:55 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > On Tue, Jan 21, 2025, John Stultz wrote:
> > @@ -5043,6 +5041,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> > .set_idt = svm_set_idt,
> > .get_gdt = svm_get_gdt,
> > .set_gdt = svm_set_gdt,
> > + .set_dr6 = svm_set_dr6,
>
>
> Just fyi, to get this to build (svm_set_dr6 takes a *svm not a *vcpu)
> I needed to create a little wrapper to get the types right:
>
> static void svm_set_dr6_vcpu(struct kvm_vcpu *vcpu, unsigned long value)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> svm_set_dr6(svm, value);
> }
Heh, yeah, I discovered as much when I tried to build wht my more generic kconfig.
> But otherwise, this looks like it has fixed the issue! I've not been
> able to trip a failure with the bionic ptrace test, nor with the debug
> test in kvm-unit-tests, both running in loops for several minutes.
FWIW, I ran the testcase in L2 for ~45 minutes and saw one failure ~3 minutes in,
but unfortunately I didn't have any tracing running so I have zero insight into
what went wrong. I'm fairly certain the failure was due to running an unpatched
kernel in L1, i.e. that I hit the ultra-rare scenario where an L2=>L1 fastpath
exit between the #DB and read from DR6 clobbered hardware DR6.
For giggle and extra confidence, I hacked KVM to emulate HLT as a nop in the
fastpath, and verified failure (and the fix) in a non-nested setup with the below
selftest, on both AMD and Intel.
Sadly, KVM doesn't handle many exits in the fastpath on AMD, so having a regression
test that isn't Intel-specific isn't really possible at the momemnt. I'm mildly
tempted to use testing as an excuse to handle some CPUID emulation in the fastpath,
as Linux userspace does a _lot_ of CPUID, e.g. a kernel build generates tens of
thousands of CPUID exits.
Anyways, this all makes me confident in the fix. I'll post it properly tomorrow.
diff --git a/tools/testing/selftests/kvm/x86/debug_regs.c b/tools/testing/selftests/kvm/x86/debug_regs.c
index 2d814c1d1dc4..a34b65052f4e 100644
--- a/tools/testing/selftests/kvm/x86/debug_regs.c
+++ b/tools/testing/selftests/kvm/x86/debug_regs.c
@@ -22,11 +22,25 @@ extern unsigned char sw_bp, hw_bp, write_data, ss_start, bd_start;
static void guest_code(void)
{
+ unsigned long val = 0xffff0ffful;
+
/* Create a pending interrupt on current vCPU */
x2apic_enable();
x2apic_write_reg(APIC_ICR, APIC_DEST_SELF | APIC_INT_ASSERT |
APIC_DM_FIXED | IRQ_VECTOR);
+ /*
+ * Debug Register Interception tests.
+ */
+ asm volatile("mov %%rax, %%dr6\n\t"
+ "hlt\n\t"
+ "mov %%dr6, %%rax\n\t"
+ : "+r" (val));
+
+ __GUEST_ASSERT(val == 0xffff0ffful,
+ "Wanted DR6 = 0xffff0ffful, got %lx\n", val);
+ GUEST_SYNC(0);
+
/*
* Software BP tests.
*
@@ -103,6 +117,9 @@ int main(void)
vm = vm_create_with_one_vcpu(&vcpu, guest_code);
run = vcpu->run;
+ vcpu_run(vcpu);
+ TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_SYNC);
+
/* Test software BPs - int3 */
memset(&debug, 0, sizeof(debug));
debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
Powered by blists - more mailing lists