lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d585c449-19f0-4a07-9223-82d892f7d3e0@linux.intel.com>
Date: Thu, 15 May 2025 11:45:32 +0800
From: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
To: Sean Christopherson <seanjc@...gle.com>,
 Mingwei Zhang <mizhang@...gle.com>
Cc: Zide Chen <zide.chen@...el.com>, Peter Zijlstra <peterz@...radead.org>,
 Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>,
 Namhyung Kim <namhyung@...nel.org>, Paolo Bonzini <pbonzini@...hat.com>,
 Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
 Adrian Hunter <adrian.hunter@...el.com>, Liang@...gle.com,
 Kan <kan.liang@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>,
 linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
 kvm@...r.kernel.org, linux-kselftest@...r.kernel.org,
 Yongwei Ma <yongwei.ma@...el.com>,
 Xiong Zhang <xiong.y.zhang@...ux.intel.com>,
 Jim Mattson <jmattson@...gle.com>, Sandipan Das <sandipan.das@....com>,
 Eranian Stephane <eranian@...gle.com>, Shukla Manali
 <Manali.Shukla@....com>, Nikunj Dadhania <nikunj.dadhania@....com>
Subject: Re: [PATCH v4 21/38] KVM: x86/pmu/vmx: Save/load guest
 IA32_PERF_GLOBAL_CTRL with vm_exit/entry_ctrl


On 5/15/2025 8:33 AM, Sean Christopherson wrote:
> On Wed, Mar 26, 2025, Mingwei Zhang wrote:
>> On Wed, Mar 26, 2025 at 9:51 AM Chen, Zide <zide.chen@...el.com> wrote:
>>>> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
>>>> index 6ad71752be4b..4e8cefcce7ab 100644
>>>> --- a/arch/x86/kvm/pmu.c
>>>> +++ b/arch/x86/kvm/pmu.c
>>>> @@ -646,6 +646,30 @@ void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
>>>>       }
>>>>  }
>>>>
>>>> +static void kvm_pmu_sync_global_ctrl_from_vmcs(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +     struct msr_data msr_info = { .index = MSR_CORE_PERF_GLOBAL_CTRL };
>>>> +
>>>> +     if (!kvm_mediated_pmu_enabled(vcpu))
>>>> +             return;
>>>> +
>>>> +     /* Sync pmu->global_ctrl from GUEST_IA32_PERF_GLOBAL_CTRL. */
>>>> +     kvm_pmu_call(get_msr)(vcpu, &msr_info);
>>>> +}
>>>> +
>>>> +static void kvm_pmu_sync_global_ctrl_to_vmcs(struct kvm_vcpu *vcpu, u64 global_ctrl)
>>>> +{
>>>> +     struct msr_data msr_info = {
>>>> +             .index = MSR_CORE_PERF_GLOBAL_CTRL,
>>>> +             .data = global_ctrl };
>>>> +
>>>> +     if (!kvm_mediated_pmu_enabled(vcpu))
>>>> +             return;
>>>> +
>>>> +     /* Sync pmu->global_ctrl to GUEST_IA32_PERF_GLOBAL_CTRL. */
>>>> +     kvm_pmu_call(set_msr)(vcpu, &msr_info);
> Eh, just add a dedicated kvm_pmu_ops hook.  Feeding this through set_msr() avoids
> adding another hook, but makes the code hard to follow and requires the above
> ugly boilerplate.

Sure. I originally thought if it's worthy to add a new kvm_pmu_ops hook
since only Intel platforms need it.


>
>>>> +}
>>>> +
>>>>  bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
>>>>  {
>>>>       switch (msr) {
>>>> @@ -680,7 +704,6 @@ int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>>               msr_info->data = pmu->global_status;
>>>>               break;
>>>>       case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
>>>> -     case MSR_CORE_PERF_GLOBAL_CTRL:
>>>>               msr_info->data = pmu->global_ctrl;
>>>>               break;
>>>>       case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
>>>> @@ -731,6 +754,9 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>
>>> pmu->global_ctrl doesn't always have the up-to-date guest value, need to
>>> sync from vmcs/vmbc before comparing it against 'data'.
>>>
>>> +               kvm_pmu_sync_global_ctrl_from_vmcs(vcpu);
>>>                 if (pmu->global_ctrl != data) {
>> Good catch. Thanks!
>>
>> This is why I really prefer just unconditionally syncing the global
>> ctrl from VMCS to pmu->global_ctrl and vice versa.
>>
>> We might get into similar problems as well in the future.
> The problem isn't conditional synchronization, it's that y'all reinvented the
> wheel, poorly.  This is a solved problem via EXREG and wrappers.
>
> That said, I went through the exercise of adding a PERF_GLOBAL_CTRL EXREG and
> associated wrappers, and didn't love the result.  Host writes should be rare, so
> the dirty tracking is overkill.  For reads, the cost of VMREAD is lower than
> VMWRITE (doesn't trigger consistency check re-evaluation on VM-Enter), and is
> dwarfed by the cost of switching all other PMU state.
>
> So I think for the initial implementation, it makes sense to propagated writes
> to the VMCS on demand, but do VMREAD after VM-Exit (if VM-Enter was successful).
> We can always revisit the optimization if/when we optimize the PMU world switches,
> e.g. to defer them if there are no active host events.

Sure.


>
>>>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>>>> index 8a7af02d466e..ecf72394684d 100644
>>>> --- a/arch/x86/kvm/vmx/nested.c
>>>> +++ b/arch/x86/kvm/vmx/nested.c
>>>> @@ -7004,7 +7004,8 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
>>>>               VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
>>>>               VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
>>>>               VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT |
>>>> -             VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
>>>> +             VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
>>>> +             VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL;
> This is completely wrong.  Stuffing VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL here
> advertises support for KVM emulation of the control, and that support is non-existent
> in this patch (and series).
>
> Just drop this, emulation of VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL can be done
> separately.

Sure.


>
>>>> +     mediated = kvm_mediated_pmu_enabled(vcpu);
>>>> +     if (cpu_has_load_perf_global_ctrl()) {
>>>> +             vm_entry_controls_changebit(vmx,
>>>> +                     VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, mediated);
>>>> +             /*
>>>> +              * Initialize guest PERF_GLOBAL_CTRL to reset value as SDM rules.
>>>> +              *
>>>> +              * Note: GUEST_IA32_PERF_GLOBAL_CTRL must be initialized to
>>>> +              * "BIT_ULL(pmu->nr_arch_gp_counters) - 1" instead of pmu->global_ctrl
>>>> +              * since pmu->global_ctrl is only be initialized when guest
>>>> +              * pmu->version > 1. Otherwise if pmu->version is 1, pmu->global_ctrl
>>>> +              * is 0 and guest counters are never really enabled.
>>>> +              */
>>>> +             if (mediated)
>>>> +                     vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL,
>>>> +                                  BIT_ULL(pmu->nr_arch_gp_counters) - 1);
> This belongs in common code, as a call to the aforementioned hook to propagate
> PERF_GLOBAL_CTRL to hardware.

Sure.


>
>>>> +     }
>>>> +
>>>> +     if (cpu_has_save_perf_global_ctrl())
>>>> +             vm_exit_controls_changebit(vmx,
>>>> +                     VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
>>>> +                     VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, mediated);
>>>>  }
>>>>
>>>>  static void intel_pmu_init(struct kvm_vcpu *vcpu)
>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>> index ff66f17d6358..38ecf3c116bd 100644
>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>> @@ -4390,6 +4390,13 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
>>>>
>>>>       if (cpu_has_load_ia32_efer())
>>>>               vmcs_write64(HOST_IA32_EFER, kvm_host.efer);
>>>> +
>>>> +     /*
>>>> +      * Initialize host PERF_GLOBAL_CTRL to 0 to disable all counters
>>>> +      * immediately once VM exits. Mediated vPMU then call perf_guest_exit()
>>>> +      * to re-enable host perf events.
>>>> +      */
>>>> +     vmcs_write64(HOST_IA32_PERF_GLOBAL_CTRL, 0);
> This needs to be conditioned on the mediated PMU being enabled, because this field
> is not constant when using the emulated PMU (or no vPMU).

Yes.


>
>>>> @@ -8451,6 +8462,15 @@ __init int vmx_hardware_setup(void)
>>>>               enable_sgx = false;
>>>>  #endif
>>>>
>>>> +     /*
>>>> +      * All CPUs that support a mediated PMU are expected to support loading
>>>> +      * and saving PERF_GLOBAL_CTRL via dedicated VMCS fields.
>>>> +      */
>>>> +     if (enable_mediated_pmu &&
>>>> +         (WARN_ON_ONCE(!cpu_has_load_perf_global_ctrl() ||
>>>> +                       !cpu_has_save_perf_global_ctrl())))
> This needs to be conditioned on !HYPERVISOR, or it *will* fire.

Ok.


>
> And placing this check here, without *any* mention of *why* you did so, is evil
> and made me very grumpy.  I had to discover the hard way that you checked the
> VMCS fields here, instead of in kvm_init_pmu_capability() where it logically
> belongs, because the VMCS configuration isn't yet initialized.
>
> Grumpiness aside, I don't like this late clear of enable_mediated_pmu, as it risks
> a variation of the problem you're trying to avoid, i.e. risks consuming the variable
> between kvm_init_pmu_capability() and here.

Yes.


>
> I don't see any reason why setup_vmcs_config() can't be called before
> kvm_x86_vendor_init(), so unless I'm missing/forgetting something, let's just do
> that, and move these checks where they belong.

I'm not quite sure about this. Let me double check.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ