lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALMp9eTHefPLtNwCgqieWbA4tE489Wyi_gVO9P59YxU+k0wgyg@mail.gmail.com>
Date: Fri, 9 Jan 2026 11:33:18 -0800
From: Jim Mattson <jmattson@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Maxim Levitsky <mlevitsk@...hat.com>, kvm@...r.kernel.org, 
	"H. Peter Anvin" <hpa@...or.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, Borislav Petkov <bp@...en8.de>, 
	Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org, x86@...nel.org, 
	Paolo Bonzini <pbonzini@...hat.com>, Chao Gao <chao.gao@...el.com>
Subject: Re: [PATCH v5 3/5] KVM: nVMX: check vmcs12->guest_ia32_debugctl value
 given by L2

On Wed, Jun 4, 2025 at 7:02 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Thu, May 22, 2025, Sean Christopherson wrote:
> > +Jim
> >
> > On Thu, May 22, 2025, Sean Christopherson wrote:
> > > On Wed, May 21, 2025, Maxim Levitsky wrote:
> > > > Check the vmcs12 guest_ia32_debugctl value before loading it, to avoid L2
> > > > being able to load arbitrary values to hardware IA32_DEBUGCTL.
> > > >
> > > > Reviewed-by: Chao Gao <chao.gao@...el.com>
> > > > Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> > > > ---
> > > >  arch/x86/kvm/vmx/nested.c | 3 ++-
> > > >  arch/x86/kvm/vmx/vmx.c    | 2 +-
> > > >  arch/x86/kvm/vmx/vmx.h    | 1 +
> > > >  3 files changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > > index e073e3008b16..00f2b762710c 100644
> > > > --- a/arch/x86/kvm/vmx/nested.c
> > > > +++ b/arch/x86/kvm/vmx/nested.c
> > > > @@ -3146,7 +3146,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> > > >           return -EINVAL;
> > > >
> > > >   if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
> > > > -     CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
> > > > +     (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
> > > > +      CC(vmcs12->guest_ia32_debugctl & ~vmx_get_supported_debugctl(vcpu, false))))
> > >
> > > This is a breaking change.  For better or worse (read: worse), KVM's ABI is to
> > > drop BTF and LBR if they're unsupported (the former is always unsupported).
> > > Failure to honor that ABI means L1 can't excplitly load what it think is its
> > > current value into L2.
> > >
> > > I'll slot in a path to provide another helper for checking the validity of
> > > DEBUGCTL.  I think I've managed to cobble together something that isn't too
> > > horrific (options are a bit limited due to the existing ugliness).
> >
> > And then Jim ruined my day.  :-)
> >
> > As evidenced by this hilarious KUT testcase, it's entirely possible there are
> > existing KVM guests that are utilizing/abusing DEBUGCTL features.
> >
> >       /*
> >        * Optional RTM test for hardware that supports RTM, to
> >        * demonstrate that the current volume 3 of the SDM
> >        * (325384-067US), table 27-1 is incorrect. Bit 16 of the exit
> >        * qualification for debug exceptions is not reserved. It is
> >        * set to 1 if a debug exception (#DB) or a breakpoint
> >        * exception (#BP) occurs inside an RTM region while advanced
> >        * debugging of RTM transactional regions is enabled.
> >        */
> >       if (this_cpu_has(X86_FEATURE_RTM)) {
> >               vmcs_write(ENT_CONTROLS,
> >                          vmcs_read(ENT_CONTROLS) | ENT_LOAD_DBGCTLS);
> >               /*
> >                * Set DR7.RTM[bit 11] and IA32_DEBUGCTL.RTM[bit 15]
> >                * in the guest to enable advanced debugging of RTM
> >                * transactional regions.
> >                */
> >               vmcs_write(GUEST_DR7, BIT(11));
> >               vmcs_write(GUEST_DEBUGCTL, BIT(15));
> >               single_step_guest("Hardware delivered single-step in "
> >                                 "transactional region", starting_dr6, 0);
> >               check_db_exit(false, false, false, &xbegin, BIT(16),
> >                             starting_dr6);
> >       } else {
> >               vmcs_write(GUEST_RIP, (u64)&skip_rtm);
> >               enter_guest();
> >       }
> >
> > For RTM specifically, disallowing DEBUGCTL.RTM but allowing DR7.RTM seems a bit
> > silly.  Unless there's a security concern, that can probably be fixed by adding
> > support for RTM.  Note, there's also a virtualization hole here, as KVM doesn't
> > vet DR7 beyond checking that bits 63:32 are zero, i.e. a guest could set DR7.RTM
> > even if KVM doesn't advertise support.  Of course, closing that hole would require
> > completely dropping support for disabling DR interception, since VMX doesn't
> > give per-DR controls.
> >
> > For the other bits, I don't see a good solution.  The only viable options I see
> > are to silently drop all unsupported bits (maybe with a quirk?), or enforce all
> > bits and cross our fingers that no L1 VMM is running guests with those bits set
> > in GUEST_DEBUGCTL.
>
> Paolo and I discussed this in PUCK this morning.
>
> We agree trying to close the DR7 virtualization hole would be futile, and that we
> should add support for DEBUGCTL.RTM to avoid breaking use of that specific bit.
>
> For the other DEBUGCTL bits, none of them actually work (although, somewhat
> amusingly, FREEZE_WHILE_SMM would "work" for real SMIs, which aren't visible to
> L1), so we're going to roll the dice, cross our fingers that no existing workload
> is setting those bits only in vmcs12.GUEST_DEBUGCTL, and enforce
> vmx_get_supported_debugctl() at VM-Enter without any quirk.
>
>   6   TR: Setting this bit to 1 enables branch trace messages to be sent.
>   7   BTS: Setting this bit enables branch trace messages (BTMs) to be logged in a BTS buffer.
>   8   BTINT: When clear, BTMs are logged in a BTS buffer in circular fashion.
>   9   BTS_OFF_OS: When set, BTS or BTM is skipped if CPL = 0.
>   10  BTS_OFF_USR: When set, BTS or BTM is skipped if CPL > 0.
>   11  FREEZE_LBRS_ON_PMI: When set, the LBR stack is frozen on a PMI request.
>   12  FREEZE_PERFMON_ON_PMI: When set, each ENABLE bit of the global counter control MSR are frozen on a PMI request.
>   13  ENABLE_UNCORE_PMI: When set, enables the logical processor to receive and generate PMI on behalf of the uncore.
>   14  FREEZE_WHILE_SMM: When set, freezes perfmon and trace messages while in SMM.

I have encountered an existing workload that sets FREEZE_WHILE_SMM,
and this change wantonly breaks it.

KVM should continue to allow legacy workloads to (perhaps
ineffectively) set FREEZE_WHILE_SMM under a quirk .

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ