lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCU2YEpU0dOk7RTk@google.com>
Date: Wed, 14 May 2025 17:33:36 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Mingwei Zhang <mizhang@...gle.com>
Cc: Zide Chen <zide.chen@...el.com>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Paolo Bonzini <pbonzini@...hat.com>, Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>, Liang@...gle.com, 
	Kan <kan.liang@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org, 
	kvm@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	Yongwei Ma <yongwei.ma@...el.com>, Xiong Zhang <xiong.y.zhang@...ux.intel.com>, 
	Dapeng Mi <dapeng1.mi@...ux.intel.com>, Jim Mattson <jmattson@...gle.com>, 
	Sandipan Das <sandipan.das@....com>, Eranian Stephane <eranian@...gle.com>, 
	Shukla Manali <Manali.Shukla@....com>, Nikunj Dadhania <nikunj.dadhania@....com>
Subject: Re: [PATCH v4 21/38] KVM: x86/pmu/vmx: Save/load guest
 IA32_PERF_GLOBAL_CTRL with vm_exit/entry_ctrl

On Wed, Mar 26, 2025, Mingwei Zhang wrote:
> On Wed, Mar 26, 2025 at 9:51 AM Chen, Zide <zide.chen@...el.com> wrote:
> > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > > index 6ad71752be4b..4e8cefcce7ab 100644
> > > --- a/arch/x86/kvm/pmu.c
> > > +++ b/arch/x86/kvm/pmu.c
> > > @@ -646,6 +646,30 @@ void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
> > >       }
> > >  }
> > >
> > > +static void kvm_pmu_sync_global_ctrl_from_vmcs(struct kvm_vcpu *vcpu)
> > > +{
> > > +     struct msr_data msr_info = { .index = MSR_CORE_PERF_GLOBAL_CTRL };
> > > +
> > > +     if (!kvm_mediated_pmu_enabled(vcpu))
> > > +             return;
> > > +
> > > +     /* Sync pmu->global_ctrl from GUEST_IA32_PERF_GLOBAL_CTRL. */
> > > +     kvm_pmu_call(get_msr)(vcpu, &msr_info);
> > > +}
> > > +
> > > +static void kvm_pmu_sync_global_ctrl_to_vmcs(struct kvm_vcpu *vcpu, u64 global_ctrl)
> > > +{
> > > +     struct msr_data msr_info = {
> > > +             .index = MSR_CORE_PERF_GLOBAL_CTRL,
> > > +             .data = global_ctrl };
> > > +
> > > +     if (!kvm_mediated_pmu_enabled(vcpu))
> > > +             return;
> > > +
> > > +     /* Sync pmu->global_ctrl to GUEST_IA32_PERF_GLOBAL_CTRL. */
> > > +     kvm_pmu_call(set_msr)(vcpu, &msr_info);

Eh, just add a dedicated kvm_pmu_ops hook.  Feeding this through set_msr() avoids
adding another hook, but makes the code hard to follow and requires the above
ugly boilerplate.

> > > +}
> > > +
> > >  bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
> > >  {
> > >       switch (msr) {
> > > @@ -680,7 +704,6 @@ int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > >               msr_info->data = pmu->global_status;
> > >               break;
> > >       case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
> > > -     case MSR_CORE_PERF_GLOBAL_CTRL:
> > >               msr_info->data = pmu->global_ctrl;
> > >               break;
> > >       case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
> > > @@ -731,6 +754,9 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >
> >
> > pmu->global_ctrl doesn't always have the up-to-date guest value, need to
> > sync from vmcs/vmbc before comparing it against 'data'.
> >
> > +               kvm_pmu_sync_global_ctrl_from_vmcs(vcpu);
> >                 if (pmu->global_ctrl != data) {
> 
> Good catch. Thanks!
> 
> This is why I really prefer just unconditionally syncing the global
> ctrl from VMCS to pmu->global_ctrl and vice versa.
> 
> We might get into similar problems as well in the future.

The problem isn't conditional synchronization, it's that y'all reinvented the
wheel, poorly.  This is a solved problem via EXREG and wrappers.

That said, I went through the exercise of adding a PERF_GLOBAL_CTRL EXREG and
associated wrappers, and didn't love the result.  Host writes should be rare, so
the dirty tracking is overkill.  For reads, the cost of VMREAD is lower than
VMWRITE (doesn't trigger consistency check re-evaluation on VM-Enter), and is
dwarfed by the cost of switching all other PMU state.

So I think for the initial implementation, it makes sense to propagated writes
to the VMCS on demand, but do VMREAD after VM-Exit (if VM-Enter was successful).
We can always revisit the optimization if/when we optimize the PMU world switches,
e.g. to defer them if there are no active host events.

> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index 8a7af02d466e..ecf72394684d 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -7004,7 +7004,8 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
> > >               VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> > >               VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
> > >               VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT |
> > > -             VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> > > +             VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
> > > +             VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL;

This is completely wrong.  Stuffing VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL here
advertises support for KVM emulation of the control, and that support is non-existent
in this patch (and series).

Just drop this, emulation of VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL can be done
separately.

> > > +     mediated = kvm_mediated_pmu_enabled(vcpu);
> > > +     if (cpu_has_load_perf_global_ctrl()) {
> > > +             vm_entry_controls_changebit(vmx,
> > > +                     VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, mediated);
> > > +             /*
> > > +              * Initialize guest PERF_GLOBAL_CTRL to reset value as SDM rules.
> > > +              *
> > > +              * Note: GUEST_IA32_PERF_GLOBAL_CTRL must be initialized to
> > > +              * "BIT_ULL(pmu->nr_arch_gp_counters) - 1" instead of pmu->global_ctrl
> > > +              * since pmu->global_ctrl is only be initialized when guest
> > > +              * pmu->version > 1. Otherwise if pmu->version is 1, pmu->global_ctrl
> > > +              * is 0 and guest counters are never really enabled.
> > > +              */
> > > +             if (mediated)
> > > +                     vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL,
> > > +                                  BIT_ULL(pmu->nr_arch_gp_counters) - 1);

This belongs in common code, as a call to the aforementioned hook to propagate
PERF_GLOBAL_CTRL to hardware.

> > > +     }
> > > +
> > > +     if (cpu_has_save_perf_global_ctrl())
> > > +             vm_exit_controls_changebit(vmx,
> > > +                     VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
> > > +                     VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, mediated);
> > >  }
> > >
> > >  static void intel_pmu_init(struct kvm_vcpu *vcpu)
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index ff66f17d6358..38ecf3c116bd 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -4390,6 +4390,13 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
> > >
> > >       if (cpu_has_load_ia32_efer())
> > >               vmcs_write64(HOST_IA32_EFER, kvm_host.efer);
> > > +
> > > +     /*
> > > +      * Initialize host PERF_GLOBAL_CTRL to 0 to disable all counters
> > > +      * immediately once VM exits. Mediated vPMU then call perf_guest_exit()
> > > +      * to re-enable host perf events.
> > > +      */
> > > +     vmcs_write64(HOST_IA32_PERF_GLOBAL_CTRL, 0);

This needs to be conditioned on the mediated PMU being enabled, because this field
is not constant when using the emulated PMU (or no vPMU).

> > > @@ -8451,6 +8462,15 @@ __init int vmx_hardware_setup(void)
> > >               enable_sgx = false;
> > >  #endif
> > >
> > > +     /*
> > > +      * All CPUs that support a mediated PMU are expected to support loading
> > > +      * and saving PERF_GLOBAL_CTRL via dedicated VMCS fields.
> > > +      */
> > > +     if (enable_mediated_pmu &&
> > > +         (WARN_ON_ONCE(!cpu_has_load_perf_global_ctrl() ||
> > > +                       !cpu_has_save_perf_global_ctrl())))

This needs to be conditioned on !HYPERVISOR, or it *will* fire.

And placing this check here, without *any* mention of *why* you did so, is evil
and made me very grumpy.  I had to discover the hard way that you checked the
VMCS fields here, instead of in kvm_init_pmu_capability() where it logically
belongs, because the VMCS configuration isn't yet initialized.

Grumpiness aside, I don't like this late clear of enable_mediated_pmu, as it risks
a variation of the problem you're trying to avoid, i.e. risks consuming the variable
between kvm_init_pmu_capability() and here.

I don't see any reason why setup_vmcs_config() can't be called before
kvm_x86_vendor_init(), so unless I'm missing/forgetting something, let's just do
that, and move these checks where they belong.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ