[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6188F6B4.1000402@huawei.com>
Date: Mon, 8 Nov 2021 18:06:44 +0800
From: Liuxiangdong <liuxiangdong5@...wei.com>
To: Like Xu <like.xu.linux@...il.com>,
Zhu Lingshan <lingshan.zhu@...el.com>
CC: <seanjc@...gle.com>, <vkuznets@...hat.com>,
<wanpengli@...cent.com>, <jmattson@...gle.com>, <joro@...tes.org>,
<kan.liang@...ux.intel.com>, <ak@...ux.intel.com>,
<wei.w.wang@...el.com>, <eranian@...gle.com>,
<linux-kernel@...r.kernel.org>, <x86@...nel.org>,
<kvm@...r.kernel.org>, <boris.ostrvsky@...cle.com>,
Yao Yuan <yuan.yao@...el.com>,
"Venkatesh Srinivas" <venkateshs@...omium.org>,
"Fangyi (Eric)" <eric.fangyi@...wei.com>,
Xiexiangyou <xiexiangyou@...wei.com>
Subject: Re: [PATCH V10 05/18] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit
when vPMU is enabled
On 2021/11/8 16:44, Like Xu wrote:
> On 8/11/2021 4:27 pm, Liuxiangdong wrote:
>>
>>
>> On 2021/11/8 12:11, Like Xu wrote:
>>> On 8/11/2021 12:07 pm, Liuxiangdong wrote:
>>>>
>>>>
>>>> On 2021/11/8 11:06, Like Xu wrote:
>>>>> On 7/11/2021 6:14 pm, Liuxiangdong wrote:
>>>>>> Hi, like and lingshan.
>>>>>>
>>>>>> As said, IA32_MISC_ENABLE[7] bit depends on the PMU is enabled
>>>>>> for the guest, so a software
>>>>>> write openration to this bit will be ignored.
>>>>>>
>>>>>> But, in this patch, all the openration that writes
>>>>>> msr_ia32_misc_enable in guest could make this bit become 0.
>>>>>>
>>>>>> Suppose:
>>>>>> When we start vm with "enable_pmu",
>>>>>> vcpu->arch.ia32_misc_enable_msr may be 0x80 first.
>>>>>> And next, guest writes msr_ia32_misc_enable value 0x1.
>>>>>> What we want could be 0x81, but unfortunately, it will be 0x1
>>>>>> because of
>>>>>> "data &= ~MSR_IA32_MISC_ENABLE_EMON;"
>>>>>> And even if guest writes msr_ia32_misc_enable value 0x81, it will
>>>>>> be 0x1 also.
>>>>>>
>>>>>
>>>>> Yes and thank you. The fix has been committed on my private tree
>>>>> for a long time.
>>>>>
>>>>>>
>>>>>> What we want is write operation will not change this bit. So, how
>>>>>> about this?
>>>>>>
>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu
>>>>>> *vcpu, struct msr_data *msr_info)
>>>>>> }
>>>>>> break;
>>>>>> case MSR_IA32_MISC_ENABLE:
>>>>>> + data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>>>>> + data |= (vcpu->arch.ia32_misc_enable_msr &
>>>>>> MSR_IA32_MISC_ENABLE_EMON);
>>>>>> if (!kvm_check_has_quirk(vcpu->kvm,
>>>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>>> ((vcpu->arch.ia32_misc_enable_msr ^ data) &
>>>>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>>> if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>>>
>>>>>>
>>>>>
>>>>> How about this for the final state considering PEBS enabling:
>>>>>
>>>>> case MSR_IA32_MISC_ENABLE: {
>>>>> u64 old_val = vcpu->arch.ia32_misc_enable_msr;
>>>>> u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>>>> MSR_IA32_MISC_ENABLE_EMON;
>>>>>
>>>> u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>>> MSR_IA32_MISC_ENABLE_EMON;
>>>>
>>>> Repetitive "MSR_IA32_MISC_ENABLE_EMON" ?
>>>
>>> Oops,
>>>
>>> u64 pmu_mask = MSR_IA32_MISC_ENABLE_EMON |
>>> MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
>>>
>>
>> Yes. bit[12] is also read-only, so we can keep this bit unchanged also.
>>
>> And, because write operation will not change this bit by "pmu_mask",
>> do we still need this if statement?
>>
>> /* RO bits */
>> if (!msr_info->host_initiated &&
>> ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
>> return 1;
>>
>> "(old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL" means some
>> operation tries to change this bit,
>> so we cannot allow it.
>> But, if there is no this judgement, "pmu_mask" will still make this
>> bit[12] no change.
>>
>> The only difference is that we can not change other bit (except bit
>> 12 and bit 7) once "old_val[12] != data[12]" if there exists this
>> statement
>> and we can change other bit if there is no judgement.
>>
>> For both MSR_IA32_MISC_ENABLE_EMON and MSR_IA32_MISC_ENABLE_EMON are
>> read-only, maybe we can keep
>> their behavioral consistency. Either both judge, or neither.
>
> One more difference per Intel SDM, I assume:
>
> For Bit 7, Performance Monitoring Available (R)
> (R) means that attempts to change this bit will be silent;
> For Bit 12, Processor Event Based Sampling (PEBS) Unavailable (RO),
> (RO) means that attempts to change this bit will be #GP;
>
Yes, I found it in SDM. You're right. Thanks for your explanation!
>>
>> Do you think so?
>>
>>
>>> I'll send the fix after sync with Lingshan.
>>>
>>>>
>>>>> /* RO bits */
>>>>> if (!msr_info->host_initiated &&
>>>>> ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
>>>>> return 1;
>>>>>
>>>>> /*
>>>>> * For a dummy user space, the order of setting vPMU
>>>>> capabilities and
>>>>> * initialising MSR_IA32_MISC_ENABLE is not strictly
>>>>> guaranteed, so to
>>>>> * avoid inconsistent functionality we keep the vPMU bits
>>>>> unchanged here.
>>>>> */
>>>> Yes. It's a little clearer with comments.
>>>
>>> Thanks for your feedback! Enjoy the feature.
>>>
>>>>> data &= ~pmu_mask;
>>>>> data |= old_val & pmu_mask;
>>>>> if (!kvm_check_has_quirk(vcpu->kvm,
>>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>> ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>> if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>> return 1;
>>>>> vcpu->arch.ia32_misc_enable_msr = data;
>>>>> kvm_update_cpuid_runtime(vcpu);
>>>>> } else {
>>>>> vcpu->arch.ia32_misc_enable_msr = data;
>>>>> }
>>>>> break;
>>>>> }
>>>>>
>>>>>> Or is there anything in your design intention I don't understand?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Xiangdong Liu
>>>>>>
>>>>>>
>>>>>> On 2021/8/6 21:37, Zhu Lingshan wrote:
>>>>>>> From: Like Xu <like.xu@...ux.intel.com>
>>>>>>>
>>>>>>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7]
>>>>>>> bit to
>>>>>>> detect whether the processor supports performance monitoring
>>>>>>> facility.
>>>>>>>
>>>>>>> It depends on the PMU is enabled for the guest, and a software
>>>>>>> write
>>>>>>> operation to this available bit will be ignored. The proposal to
>>>>>>> ignore
>>>>>>> the toggle in KVM is the way to go and that behavior matches
>>>>>>> bare metal.
>>>>>>>
>>>>>>> Cc: Yao Yuan <yuan.yao@...el.com>
>>>>>>> Signed-off-by: Like Xu <like.xu@...ux.intel.com>
>>>>>>> Reviewed-by: Venkatesh Srinivas <venkateshs@...omium.org>
>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@...el.com>
>>>>>>> Acked-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>>>>>>> ---
>>>>>>> arch/x86/kvm/vmx/pmu_intel.c | 1 +
>>>>>>> arch/x86/kvm/x86.c | 1 +
>>>>>>> 2 files changed, 2 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c
>>>>>>> b/arch/x86/kvm/vmx/pmu_intel.c
>>>>>>> index 9efc1a6b8693..d9dbebe03cae 100644
>>>>>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>>>>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>>>>>>> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct
>>>>>>> kvm_vcpu *vcpu)
>>>>>>> if (!pmu->version)
>>>>>>> return;
>>>>>>> + vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
>>>>>>> perf_get_x86_pmu_capability(&x86_pmu);
>>>>>>> pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>>> index efd11702465c..f6b6984e26ef 100644
>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>> @@ -3321,6 +3321,7 @@ int kvm_set_msr_common(struct kvm_vcpu
>>>>>>> *vcpu, struct msr_data *msr_info)
>>>>>>> }
>>>>>>> break;
>>>>>>> case MSR_IA32_MISC_ENABLE:
>>>>>>> + data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>>>>>> if (!kvm_check_has_quirk(vcpu->kvm,
>>>>>>> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>>>>>>> ((vcpu->arch.ia32_misc_enable_msr ^ data) &
>>>>>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>>>> if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>>>>
>>>>
>>
Powered by blists - more mailing lists