linux-kernel - Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87edx8xn8h.fsf@redhat.com>
Date:   Mon, 22 Aug 2022 18:21:50 +0200
From:   Vitaly Kuznetsov <vkuznets@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
        Anirudh Rayabharam <anrayabh@...ux.microsoft.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Maxim Levitsky <mlevitsk@...hat.com>,
        Nathan Chancellor <nathan@...nel.org>,
        Michael Kelley <mikelley@...rosoft.com>,
        linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 03/26] x86/hyperv: Update 'struct
 hv_enlightened_vmcs' definition

Sean Christopherson <seanjc@...gle.com> writes:

> On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@...gle.com> writes:
>> 
>> > On Thu, Aug 18, 2022, Vitaly Kuznetsov wrote:
>> >> Sean Christopherson <seanjc@...gle.com> writes:
>> >> 
>> >> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> >> >> + * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
>> >> >> + * published TLFS version. When the bit is set, nested hypervisor can use
>> >> >> + * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
>> >> >> + * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
>> >> >> + * specification).
>> >> >> + */
>> >> >> +#define HV_X64_NESTED_EVMCS1_2022_UPDATE		BIT(0)
>> >> >
>> >> > This bit is now defined[*], but the docs says it's only for perf_global_ctrl.  Are
>> >> > we expecting an update to the TLFS?
>> >> >
>> >> > 	Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl fields
>> >> > 	in the enlightened VMCS.
>> >> >
>> >> > [*] https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#hypervisor-nested-virtualization-features---0x4000000a
>> >> >
>> >> 
>> >> Oh well, better this than nothing. I'll ping the people who told me
>> >> about this bit that their description is incomplete.
>> >
>> > Not that it changes anything, but I'd rather have no documentation.  I'd much rather
>> > KVM say "this is the undocumented behavior" than "the document behavior is wrong".
>> >
>> 
>> So I reached out to Microsoft and their answer was that for all these new
>> eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX
>> MSRs should be enough. *PerfGlobalCtrl case is special because of Win11
>> bug (if we expose the feature in VMX feature MSRs but don't set
>> CPUID.0x4000000A.EBX BIT(0) it just doesn't boot).
>
> I.e. TSC_SCALING shouldn't be gated on the flag?  If so, then the 2-D array approach
> is overkill since (a) the CPUID flag only controls PERF_GLOBAL_CTRL and (b) we aren't
> expecting any more flags in the future.
>

Unfortunately, we have to gate the presence of these new features on
something, otherwise VMM has no way to specify which particular eVMCS
"revision" it wants (TL;DR: we will break migration).

My initial implementation was inventing 'eVMCS revision' concept:
https://lore.kernel.org/kvm/20220629150625.238286-7-vkuznets@redhat.com/

which is needed if we don't gate all these new fields on CPUID.0x4000000A.EBX BIT(0).

Going forward, we will still (likely) need something when new fields show up.

> What about this for an implementation?
>
> static bool evmcs_has_perf_global_ctrl(struct kvm_vcpu *vcpu)
> {
> 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>
> 	/*
> 	 * Filtering VMX controls for eVMCS compatibility should only be done
> 	 * for guest accesses, and all such accesses should be gated on Hyper-V
> 	 * being enabled and initialized.
> 	 */
> 	if (WARN_ON_ONCE(!hv_vcpu))
> 		return false;
>
> 	return hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL;
> }
>
> static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu, u32 msr_index)
> {
> 	u32 unsupported_ctrls;
>
> 	switch (msr_index) {
> 	case MSR_IA32_VMX_EXIT_CTLS:
> 	case MSR_IA32_VMX_TRUE_EXIT_CTLS:
> 		unsupported_ctrls = EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
> 		if (!evmcs_has_perf_global_ctrl(vcpu))
> 			unsupported_ctrls |= VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> 		return unsupported_ctrls;
> 	case MSR_IA32_VMX_ENTRY_CTLS:
> 	case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> 		unsupported_ctrls = EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
> 		if (!evmcs_has_perf_global_ctrl(vcpu))
> 			unsupported_ctrls |= VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> 		return unsupported_ctrls;
> 	case MSR_IA32_VMX_PROCBASED_CTLS2:
> 		return EVMCS1_UNSUPPORTED_2NDEXEC;
> 	case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> 	case MSR_IA32_VMX_PINBASED_CTLS:
> 		return EVMCS1_UNSUPPORTED_PINCTRL;
> 	case MSR_IA32_VMX_VMFUNC:
> 		return EVMCS1_UNSUPPORTED_VMFUNC;
> 	default:
> 		KVM_BUG_ON(1, vcpu->kvm);
> 		return 0;
> 	}
> }
>
> void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> {
> 	u64 unsupported_ctrls = evmcs_get_unsupported_ctls(vcpu, msr_index);
>
> 	if (msr_index == MSR_IA32_VMX_VMFUNC)
> 		*pdata &= ~unsupported_ctrls;
> 	else
> 		*pdata &= ~(unsupported_ctrls << 32);
> }
>

It's smaller and I like it but it would only work in conjunction with
KVM_CAP_HYPERV_ENLIGHTENED_VMCS2...

>
>> What I'm still concerned about is future proofing KVM for new
>> features. When something is getting added to KVM for which no eVMCS
>> field is currently defined, both Hyper-V-on-KVM and KVM-on-Hyper-V cases
>> should be taken care of. It would probably be better to reverse our
>> filtering, explicitly listing features supported in eVMCS. The lists are
>> going to be fairly long but at least we won't have to take care of any
>> new architectural feature added to KVM.
>
> Having the filtering be opt-in crossed my mind as well.  Reversing the filtering
> can be done after this series though, correct?
>

Yes, that's my plan, Get this in to fix the immediate issue with 2022
features and probably reverse the filtering before Microsoft releases
something else :-)

-- 
Vitaly