[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aEBR7mbg-iTYdCtJ@google.com>
Date: Wed, 4 Jun 2025 07:02:22 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Maxim Levitsky <mlevitsk@...hat.com>
Cc: kvm@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
x86@...nel.org, Paolo Bonzini <pbonzini@...hat.com>, Chao Gao <chao.gao@...el.com>,
Jim Mattson <jmattson@...gle.com>
Subject: Re: [PATCH v5 3/5] KVM: nVMX: check vmcs12->guest_ia32_debugctl value
given by L2
On Thu, May 22, 2025, Sean Christopherson wrote:
> +Jim
>
> On Thu, May 22, 2025, Sean Christopherson wrote:
> > On Wed, May 21, 2025, Maxim Levitsky wrote:
> > > Check the vmcs12 guest_ia32_debugctl value before loading it, to avoid L2
> > > being able to load arbitrary values to hardware IA32_DEBUGCTL.
> > >
> > > Reviewed-by: Chao Gao <chao.gao@...el.com>
> > > Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> > > ---
> > > arch/x86/kvm/vmx/nested.c | 3 ++-
> > > arch/x86/kvm/vmx/vmx.c | 2 +-
> > > arch/x86/kvm/vmx/vmx.h | 1 +
> > > 3 files changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index e073e3008b16..00f2b762710c 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -3146,7 +3146,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> > > return -EINVAL;
> > >
> > > if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
> > > - CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
> > > + (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
> > > + CC(vmcs12->guest_ia32_debugctl & ~vmx_get_supported_debugctl(vcpu, false))))
> >
> > This is a breaking change. For better or worse (read: worse), KVM's ABI is to
> > drop BTF and LBR if they're unsupported (the former is always unsupported).
> > Failure to honor that ABI means L1 can't excplitly load what it think is its
> > current value into L2.
> >
> > I'll slot in a path to provide another helper for checking the validity of
> > DEBUGCTL. I think I've managed to cobble together something that isn't too
> > horrific (options are a bit limited due to the existing ugliness).
>
> And then Jim ruined my day. :-)
>
> As evidenced by this hilarious KUT testcase, it's entirely possible there are
> existing KVM guests that are utilizing/abusing DEBUGCTL features.
>
> /*
> * Optional RTM test for hardware that supports RTM, to
> * demonstrate that the current volume 3 of the SDM
> * (325384-067US), table 27-1 is incorrect. Bit 16 of the exit
> * qualification for debug exceptions is not reserved. It is
> * set to 1 if a debug exception (#DB) or a breakpoint
> * exception (#BP) occurs inside an RTM region while advanced
> * debugging of RTM transactional regions is enabled.
> */
> if (this_cpu_has(X86_FEATURE_RTM)) {
> vmcs_write(ENT_CONTROLS,
> vmcs_read(ENT_CONTROLS) | ENT_LOAD_DBGCTLS);
> /*
> * Set DR7.RTM[bit 11] and IA32_DEBUGCTL.RTM[bit 15]
> * in the guest to enable advanced debugging of RTM
> * transactional regions.
> */
> vmcs_write(GUEST_DR7, BIT(11));
> vmcs_write(GUEST_DEBUGCTL, BIT(15));
> single_step_guest("Hardware delivered single-step in "
> "transactional region", starting_dr6, 0);
> check_db_exit(false, false, false, &xbegin, BIT(16),
> starting_dr6);
> } else {
> vmcs_write(GUEST_RIP, (u64)&skip_rtm);
> enter_guest();
> }
>
> For RTM specifically, disallowing DEBUGCTL.RTM but allowing DR7.RTM seems a bit
> silly. Unless there's a security concern, that can probably be fixed by adding
> support for RTM. Note, there's also a virtualization hole here, as KVM doesn't
> vet DR7 beyond checking that bits 63:32 are zero, i.e. a guest could set DR7.RTM
> even if KVM doesn't advertise support. Of course, closing that hole would require
> completely dropping support for disabling DR interception, since VMX doesn't
> give per-DR controls.
>
> For the other bits, I don't see a good solution. The only viable options I see
> are to silently drop all unsupported bits (maybe with a quirk?), or enforce all
> bits and cross our fingers that no L1 VMM is running guests with those bits set
> in GUEST_DEBUGCTL.
Paolo and I discussed this in PUCK this morning.
We agree trying to close the DR7 virtualization hole would be futile, and that we
should add support for DEBUGCTL.RTM to avoid breaking use of that specific bit.
For the other DEBUGCTL bits, none of them actually work (although, somewhat
amusingly, FREEZE_WHILE_SMM would "work" for real SMIs, which aren't visible to
L1), so we're going to roll the dice, cross our fingers that no existing workload
is setting those bits only in vmcs12.GUEST_DEBUGCTL, and enforce
vmx_get_supported_debugctl() at VM-Enter without any quirk.
6 TR: Setting this bit to 1 enables branch trace messages to be sent.
7 BTS: Setting this bit enables branch trace messages (BTMs) to be logged in a BTS buffer.
8 BTINT: When clear, BTMs are logged in a BTS buffer in circular fashion.
9 BTS_OFF_OS: When set, BTS or BTM is skipped if CPL = 0.
10 BTS_OFF_USR: When set, BTS or BTM is skipped if CPL > 0.
11 FREEZE_LBRS_ON_PMI: When set, the LBR stack is frozen on a PMI request.
12 FREEZE_PERFMON_ON_PMI: When set, each ENABLE bit of the global counter control MSR are frozen on a PMI request.
13 ENABLE_UNCORE_PMI: When set, enables the logical processor to receive and generate PMI on behalf of the uncore.
14 FREEZE_WHILE_SMM: When set, freezes perfmon and trace messages while in SMM.
Powered by blists - more mailing lists