linux-kernel - Re: [PATCH] KVM: vmx/nested: avoid blindly setting SECONDARY_EXEC_ENCLS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <71cd488d-5b4f-153d-bcd3-e38c6eedf4c6@redhat.com>
Date:   Tue, 25 Oct 2022 14:36:31 +0200
From:   Emanuele Giuseppe Esposito <eesposit@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>, Bandan Das <bsd@...hat.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: vmx/nested: avoid blindly setting
 SECONDARY_EXEC_ENCLS_EXITING when sgx is enabled



Am 24/10/2022 um 18:52 schrieb Sean Christopherson:
> "nVMX" instead of "vmx/nested"
> 
> On Mon, Oct 24, 2022, Emanuele Giuseppe Esposito wrote:
>> Currently vmx enables SECONDARY_EXEC_ENCLS_EXITING even when sgx
>> is not set in the host MSR.
>> This was probably introduced when sgx was not yet fully supported, and
>> we wanted to trap guests trying to use the feature.
> 
> Nah, it's just a boneheaded bug.
> 
>> When booting a guest, KVM checks that the cpuid bit is actually set
>> in vmx.c, and if not, it does not enable the feature.
> 
> The CPUID thing is a red herring.  That's an _additional_ restriction, KVM honors
> MSR_IA32_VMX_PROCBASED_CTLS2 when configuring vmcs01.  See adjust_vmx_controls()
> for secondary controls in setup_vmcs_config().
> 
>> However, in nesting this control bit is blindly copied, and will be
> 
> It's not "copied", KVM sets the bit in the nVMX MSR irrespective of host support,
> which is the problem.
> 
>> propagated to VMCS12 and VMCS02. Therefore, when L1 tries to boot
>> the guest, the host will try to execute VMLOAD with VMCS02 containing
>> a feature that the hardware does not support, making it fail with
>> hardware error 0x7.
>>
>> According with section A.3.3 of Intel System Programming Guide,
>> we should *always* check the value in the actual
> 
> s/we/software
> 
>> MSR_IA32_VMX_PROCBASED_CTLS2 before enabling this bit.
>>
>> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2127128
>>
> 
> Fixes: 72add915fbd5 ("KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC")
> Cc: stable@...r.kernel.org
> 
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@...hat.com>
>> ---
>>  arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++++++++--
>>  1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 8f67a9c4a287..f651084010cc 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -6678,6 +6678,25 @@ static u64 nested_vmx_calc_vmcs_enum_msr(void)
>>  	return (u64)max_idx << VMCS_FIELD_INDEX_SHIFT;
>>  }
>>  
>> +/*
>> + * According with section A.3.3 of
> 
> Avoid referencing sections/tables by "number" in comments (and changelogs), as the
> comment is all but guaranteed to become stale because future versions of the SDM
> will shift things around.
> 
> The slightly more robust way to reference a specific SDM/APM section is to use the
> title, e.g. According to section "Secondary Processor-Based VM-Execution Controls"
> in Intel's SDM ...", as hardware vendors are less likely to arbitrarily rename
> sections and tables.  It's a bit more work for readers, but any decent PDF viewer
> can search these days.
> 
>> Intel System Programming Guide
> 
> KVM typically uses "Intel's SDM" (and "AMD's APM").  Like "VMX" or "SVM", it's ok
> to use the SDM acronym without introducing since "SDM" is 
> 
>> + * we *can* set the guest MSR control X (in our case
> 
> Avoid pronouns in comments.  "we" and "us" are ambiguous, e.g. "we" can mean KVM,
> the developer, the user, etc...
> 
>> + * SECONDARY_EXEC_ENCLS_EXITING) *iff* bit 32+X of
>> + * MSR_IA32_VMX_PROCBASED_CTLS2 is set to 1.
>> + * Otherwise it must remain zero.
> 
> As a general rule, if you find yourself writing a comment and a helper for
> something that KVM absolutely needs to get right (honoring VMX MSRs), then odds
> are very good that there's a simpler/easier fix, i.e. that you're effectively
> re-inventing part of the weel.
> 
>> + */
>> +static void nested_vmx_setup_encls_exiting(struct nested_vmx_msrs *msrs)
>> +{
>> +	u32 vmx_msr_procb_low, vmx_msr_procb_high;
>> +
>> +	rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, vmx_msr_procb_low, vmx_msr_procb_high);
>> +
>> +	WARN_ON(vmx_msr_procb_low & SECONDARY_EXEC_ENCLS_EXITING);
>> +
>> +	if (enable_sgx && (vmx_msr_procb_high & SECONDARY_EXEC_ENCLS_EXITING))
>> +		msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING;
>> +}
>> +
>>  /*
>>   * nested_vmx_setup_ctls_msrs() sets up variables containing the values to be
>>   * returned for the various VMX controls MSRs when nested VMX is enabled.
>> @@ -6874,8 +6893,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
>>  		msrs->secondary_ctls_high |=
>>  			SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>>  
>> -	if (enable_sgx)
>> -		msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING;
> 
> The issue here is that I, who wrote this code long, long ago, copied the pattern
> used for enable_unrestricted_guest, flexpriority_enabled, etc... without fully
> appreciating the logic.  Unlike those module params, enable_sgx doesn't track
> hardware support, i.e. enable_sgx isn't cleared when SGX can't be enabled due to
> lack of hardware support.  As a result, KVM effectively enumerates to L1 that the
> control is always available, i.e. that KVM emulates ENCLS-exiting for L1, but KVM
> obviously doesn't actually emulating the behavior.
> 
> Not updating enable_sgx is responsible for a second bug: vmx_set_cpu_caps() doesn't
> clear the SGX bits when hardware support is unavailable.  This is a much less
> problematic bug as as it only pops up if SGX is soft-disabled (the case being
> handled by cpu_has_sgx()) or if SGX is supported for bare metal but not in the
> VMCS (will never happen when running on bare metal, but can theoertically happen
> when running in a VM).
> 
> Last but not least, KVM should ideally have module params reflect KVM's actual
> configuration.
> 
> Killing all birds with one stone, simply clear enable_sgx when ENCLS-exiting isn't
> supported.  The #ifdef is a little gross, but I think it's marginally less ugly
> than having vmx.c define a dummy boolean.
> 
> Compile tested only...
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9dba04b6b019..65f092e4a81b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8263,6 +8263,11 @@ static __init int hardware_setup(void)
>         if (!cpu_has_virtual_nmis())
>                 enable_vnmi = 0;
>  
> +#ifdef CONFIG_X86_SGX_KVM
> +       if (!cpu_has_vmx_encls_vmexit())
> +               enable_sgx = false;
> +#endif
> +
>         /*
>          * set_apic_access_page_addr() is used to reload apic access
>          * page upon invalidation.  No need to do anything if not
> 

Thanks for all the suggestions and explanations, I am going to apply
your changes and send v2!

Emanuele