[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <71af8435d2085b3f969cb3e73cff5bfacd243819.camel@redhat.com>
Date: Thu, 01 May 2025 16:53:17 -0400
From: mlevitsk@...hat.com
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>, Borislav
Petkov <bp@...en8.de>, Paolo Bonzini <pbonzini@...hat.com>, x86@...nel.org,
Dave Hansen <dave.hansen@...ux.intel.com>, Ingo Molnar <mingo@...hat.com>,
linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 3/3] x86: KVM: VMX: preserve host's
DEBUGCTLMSR_FREEZE_IN_SMM while in the guest mode
On Thu, 2025-05-01 at 16:41 -0400, mlevitsk@...hat.com wrote:
> On Tue, 2025-04-22 at 16:41 -0700, Sean Christopherson wrote:
> > On Tue, Apr 15, 2025, Maxim Levitsky wrote:
> > > Pass through the host's DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM to the guest
> > > GUEST_IA32_DEBUGCTL without the guest seeing this value.
> > >
> > > Note that in the future we might allow the guest to set this bit as well,
> > > when we implement PMU freezing on VM own, virtual SMM entry.
> > >
> > > Since the value of the host DEBUGCTL can in theory change between VM runs,
> > > check if has changed, and if yes, then reload the GUEST_IA32_DEBUGCTL with
> > > the new value of the host portion of it (currently only the
> > > DEBUGCTLMSR_FREEZE_IN_SMM bit)
> >
> > No, it can't. DEBUGCTLMSR_FREEZE_IN_SMM can be toggled via IPI callback, but
> > IRQs are disabled for the entirety of the inner run loop. And if I'm somehow
> > wrong, this change movement absolutely belongs in a separate patch.
Hi,
You are right here - reading MSR_IA32_DEBUGCTLMSR in the inner loop is a performance
regression.
Any ideas on how to solve this then? Since currently its the common code that
reads the current value of the MSR_IA32_DEBUGCTLMSR and it doesn't leave any indication
about if it changed I can do either
1. store old value as well, something like 'vcpu->arch.host_debugctl_old' Ugly IMHO.
2. add DEBUG_CTL to the set of the 'dirty' registers, e.g add new bit for kvm_register_mark_dirty
It looks a bit overkill to me
3. Add new x86 callback for something like .sync_debugctl(). I vote for this option.
What do you think/prefer?
Best regards,
Maxim Levitsky
> >
> > > Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> > > ---
> > > arch/x86/kvm/svm/svm.c | 2 ++
> > > arch/x86/kvm/vmx/vmx.c | 28 +++++++++++++++++++++++++++-
> > > arch/x86/kvm/x86.c | 2 --
> > > 3 files changed, 29 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index cc1c721ba067..fda0660236d8 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -4271,6 +4271,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
> > > svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
> > > svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
> > >
> > > + vcpu->arch.host_debugctl = get_debugctlmsr();
> > > +
> > > /*
> > > * Disable singlestep if we're injecting an interrupt/exception.
> > > * We don't want our modified rflags to be pushed on the stack where
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index c9208a4acda4..e0bc31598d60 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -2194,6 +2194,17 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
> > > return debugctl;
> > > }
> > >
> > > +static u64 vmx_get_host_preserved_debugctl(struct kvm_vcpu *vcpu)
> >
> > No, just open code handling DEBUGCTLMSR_FREEZE_IN_SMM, or make it a #define.
> > I'm not remotely convinced that we'll ever want to emulate DEBUGCTLMSR_FREEZE_IN_SMM,
> > and trying to plan for that possibility and adds complexity for no immediate value.
>
> Hi,
>
> The problem here is a bit different: we indeed are very unlikely to emulate the
> DEBUGCTLMSR_FREEZE_IN_SMM but however, when I wrote this patch I was sure that this bit is
> mandatory with PMU version of 2 or more, but looks like it is optional after all:
>
> "
> Note that system software must check if the processor supports the IA32_DEBUGCTL.FREEZE_WHILE_SMM
> control bit. IA32_DEBUGCTL.FREEZE_WHILE_SMM is supported if IA32_PERF_CAPABIL-
> ITIES.FREEZE_WHILE_SMM[Bit 12] is reporting 1. See Section 20.8 for details of detecting the presence of
> IA32_PERF_CAPABILITIES MSR."
>
> KVM indeed doesn't set the bit 12 of IA32_PERF_CAPABILITIES.
>
> However, note that the Linux kernel silently sets this bit without checking the aforementioned capability
> bit and ends up with a #GP exception, which it silently ignores.... (I checked this with a trace...)
>
> This led me to believe that this bit should be unconditionally supported,
> meaning that KVM should at least fake setting it without triggering a #GP.
>
> Since that is not the case, I can revert to the simpler model of exclusively using GUEST_IA32_DEBUGCTL
> while hiding the bit from the guest, however I do vote to keep the guest/host separation.
>
> >
> > > +{
> > > + /*
> > > + * Bits of host's DEBUGCTL that we should preserve while the guest is
> > > + * running.
> > > + *
> > > + * Some of those bits might still be emulated for the guest own use.
> > > + */
> > > + return DEBUGCTLMSR_FREEZE_IN_SMM;
> > >
> > > u64 vmx_get_guest_debugctl(struct kvm_vcpu *vcpu)
> > > {
> > > return to_vmx(vcpu)->msr_ia32_debugctl;
> > > @@ -2202,9 +2213,11 @@ u64 vmx_get_guest_debugctl(struct kvm_vcpu *vcpu)
> > > static void __vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data)
> > > {
> > > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > + u64 host_mask = vmx_get_host_preserved_debugctl(vcpu);
> > >
> > > vmx->msr_ia32_debugctl = data;
> > > - vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> > > + vmcs_write64(GUEST_IA32_DEBUGCTL,
> > > + (vcpu->arch.host_debugctl & host_mask) | (data & ~host_mask));
> > > }
> > >
> > > bool vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
> > > @@ -2232,6 +2245,7 @@ bool vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated
> > > return true;
> > > }
> > >
> > > +
> >
> > Spurious newline.
> >
> > > /*
> > > * Writes msr value into the appropriate "register".
> > > * Returns 0 on success, non-0 otherwise.
> > > @@ -7349,6 +7363,7 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
> > > {
> > > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > unsigned long cr3, cr4;
> > > + u64 old_debugctl;
> > >
> > > /* Record the guest's net vcpu time for enforced NMI injections. */
> > > if (unlikely(!enable_vnmi &&
> > > @@ -7379,6 +7394,17 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
> > > vmcs_write32(PLE_WINDOW, vmx->ple_window);
> > > }
> > >
> > > + old_debugctl = vcpu->arch.host_debugctl;
> > > + vcpu->arch.host_debugctl = get_debugctlmsr();
> > > +
> > > + /*
> > > + * In case the host DEBUGCTL had changed since the last time we
> > > + * read it, update the guest's GUEST_IA32_DEBUGCTL with
> > > + * the host's bits.
> > > + */
> > > + if (old_debugctl != vcpu->arch.host_debugctl)
> >
> > This can and should be optimized to only do an update if a host-preserved bit
> > is toggled.
>
> True, I will do this in the next version.
>
> >
> > > + __vmx_set_guest_debugctl(vcpu, vmx->msr_ia32_debugctl);
> >
> > I would rather have a helper that explicitly writes the VMCS field, not one that
> > sets the guest value *and* writes the VMCS field.
>
> >
> > The usage in init_vmcs() doesn't need to write vmx->msr_ia32_debugctl because the
> > vCPU is zero allocated, and this usage doesn't change vmx->msr_ia32_debugctl.
> > So the only path that actually needs to modify vmx->msr_ia32_debugctl is
> > vmx_set_guest_debugctl().
>
>
> But what about nested entry? nested entry pretty much sets the MSR to a value given by the guest.
>
> Also technically the intel_pmu_legacy_freezing_lbrs_on_pmi also changes the guest value by emulating what the real hardware does.
>
> Best regards,
> Maxim Levitsky
>
>
> >
> > > +
> > > /*
> > > * We did this in prepare_switch_to_guest, because it needs to
> > > * be within srcu_read_lock.
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 844e81ee1d96..05e866ed345d 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -11020,8 +11020,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > > set_debugreg(0, 7);
> > > }
> > >
> > > - vcpu->arch.host_debugctl = get_debugctlmsr();
> > > -
> > > guest_timing_enter_irqoff();
> > >
> > > for (;;) {
> > > --
> > > 2.26.3
> > >
>
Powered by blists - more mailing lists