[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b40fd338-cb3b-b602-0059-39f775e77ad6@intel.com>
Date: Fri, 21 Oct 2022 11:12:57 +0800
From: Xiaoyao Li <xiaoyao.li@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: x86: Fix the initial value of mcg_cap
On 10/21/2022 12:32 AM, Sean Christopherson wrote:
> On Thu, Oct 20, 2022, Xiaoyao Li wrote:
>> On 10/20/2022 10:27 PM, Sean Christopherson wrote:
>>> On Thu, Oct 20, 2022, Xiaoyao Li wrote:
>>>> vcpu->arch.mcg_cap represents the value of MSR_IA32_MCG_CAP. It's
>>>> set via ioctl(KVM_X86_SETUP_MCE) from userspace when exposing and
>>>> configuring MCE to guest.
>>>>
>>>> It's wrong to leave the default value as KVM_MAX_MCE_BANKS.
>>>
>>> Why? I agree it's an odd default, but the whole MCE API is odd. Functionally,
>>> I don't see anything that's broken by allowing the guest to access the MCx_CTL MSRs
>>> by default.
>>
>> Yes. Allowing the access doesn't cause any issue for a VM.
>>
>> However, for the perspective of virtualization. It virtualizes a magic
>> hardware that even CPUID.MCA/MCE is not advertised and MCE is not set up by
>> userspace, guest is told there are 32 banks and all the banks can be
>> accessed.
>
> '0' isn't necessarily better though, e.g. if userspace parrots back KVM's "supported"
> CPUID without invoking KVM_X86_SETUP_MCE, then it's equally odd that the guest will
> see no supported MCE MSRS.
>
> Older versions of the SDM also state (or at least very strongly imply) that banks
> 0-3 are always available on P6.
>
> Bank 0 is an especially weird case, as several of the MSRs are aliased to other
> MSRs that predate the machine check architecture.
>
> Anyways, if this were newly introduced code I'd be all for defaulting to '0', but
> KVM has defaulted to KVM_MAX_MCE_BANKS since KVM_X86_SETUP_MCE was added way back
> in 2009. Unless there's a bug that's fixed by this, I'm inclined to keep the
> current behavior even though it's weird, as hiding all MCE MSRs by default could
> theoretically cause a regression, e.g. by triggering #GP on MSRs that an older
> guest expects to always exist.
fair enough.
> If we really want to clean up this code, I think the correct approach would be to
> inject #GP on all relevant MSRs if CPUID.MCA==0, e.g.
It's what I thought of as well. But I didn't find any statement in SDM
of "Accessing Machine Check MSRs gets #GP if no CPUID.MCA"
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4bd5f8a751de..97fafd851d8d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3260,6 +3260,9 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> u64 data = msr_info->data;
> u32 offset, last_msr;
>
> + if (!msr_info->host_initiated && !guest_cpuid_has(X86_FEATURE_MCA))
> + return 1;
> +
> switch (msr) {
> case MSR_IA32_MCG_STATUS:
> vcpu->arch.mcg_status = data;
> @@ -3891,6 +3894,14 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
> unsigned bank_num = mcg_cap & 0xff;
> u32 offset, last_msr;
>
> + if (msr == MSR_IA32_P5_MC_ADDR || msr == MSR_IA32_P5_MC_TYPE) {
> + *pdata = 0;
> + return 0;
> + }
> +
> + if (!host && !guest_cpuid_has(X86_FEATURE_MCA))
> + return 1;
> +
> switch (msr) {
> case MSR_IA32_P5_MC_ADDR:
> case MSR_IA32_P5_MC_TYPE:
>
> Or alternatively, this should work too:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4bd5f8a751de..e4a44d7af0a6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3774,6 +3774,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_MCG_STATUS:
> case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
> + if (!msr_info->host_initiated &&
> + !guest_cpuid_has(X86_FEATURE_MCA))
> + return 1;
> return set_msr_mce(vcpu, msr_info);
>
> case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
> @@ -4142,13 +4145,17 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>
> msr_info->data = vcpu->arch.msr_kvm_poll_control;
> break;
> - case MSR_IA32_P5_MC_ADDR:
> - case MSR_IA32_P5_MC_TYPE:
> case MSR_IA32_MCG_CAP:
> case MSR_IA32_MCG_CTL:
> case MSR_IA32_MCG_STATUS:
> case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
> + if (!msr_info->host_initiated &&
> + !guest_cpuid_has(X86_FEATURE_MCA))
> + return 1;
> + fallthrough;
> + case MSR_IA32_P5_MC_ADDR:
> + case MSR_IA32_P5_MC_TYPE:
> return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
> msr_info->host_initiated);
> case MSR_IA32_XSS:
>
Powered by blists - more mailing lists