linux-kernel - Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y2FePYteNrEfZ7D5@google.com>
Date:   Tue, 1 Nov 2022 17:58:21 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Yu Zhang <yu.c.zhang@...ux.intel.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Eric Li <ercli@...avis.edu>,
        David Matlack <dmatlack@...gle.com>,
        Oliver Upton <oupton@...gle.com>
Subject: Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any
 _host_ supported value

On Tue, Nov 01, 2022, Yu Zhang wrote:
> On Mon, Oct 31, 2022 at 05:11:10PM +0000, Sean Christopherson wrote:
> > vmcs_config.nested never becomes out-of-date, it's read-only after __init (not
> > currently marked as such, that will be remedied soon).
> > 
> > The auditing performed by KVM is purely to guard against userspace enabling
> > features that KVM doesn't support.  KVM is not responsible for ensuring that the
> > vCPU's CPUID model match the VMX MSR model.
> 
> Do you mean the VMX MSR model shall not be changed after the cpuid updates?

No, I mean that the virtual CPU model (CPUID + VMX MSRs) that is presented to the
guest is the responsibility of host userspace.  KVM only cares about not enabling
bits/features that KVM doesn't supported.

> And for VMX MSR model, do you mean the vmx->nested.msrs or the ones in 
> vmcs_config->nested? 

vmx->nested.msrs.  vmcs_config->nested is effectively the VMX equivalent of
KVM_GET_SUPPORTED_CPUID.

> What I observed is that vmx->nested.msrs.secondary_ctls_high will be changed
> in vmx_adjust_secondary_exec_control(), which can be triggered after cpuid is
> set. 

Ugh, that path got overlooked when we yanked out KVM's manipulaton of VMX MSRs
in response to guest CPUID changes.  I wonder if we can get away with changing
KVM's behavior to only ensure a feature isn't exposed to L2 when it's not exposed
to L1.

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6b4266e949a3..cfc35d559d91 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4523,8 +4523,8 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
         * Update the nested MSR settings so that a nested VMM can/can't set
         * controls for features that are/aren't exposed to the guest.
         */
-       if (nested) {
-               if (enabled)
+       if (nested && !enabled)
+               if (exiting)
                        vmx->nested.msrs.secondary_ctls_high |= control;
                else
                        vmx->nested.msrs.secondary_ctls_high &= ~control;


> Since KVM's config(vmcs_config->nested.secondary_ctls_high) is done during init
> by nested_vmx_setup_ctls_msrs(), which only kept a subset of the flags from the
> vmcs_confg->cpu_based_2nd_exec_ctrl, the vmx_restore_control_msr() could fail
> later, when userspace VMM tries to enable a feature(the only one I witnessed is
> SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE) by setting MSR_IA32_VMX_PROCBASED_CTLS2.
> Because the vmx->nested.msrs.secondary_ctls_high is updated by cpuid, but this
> bit is not taken from vmcs_conf->cpu_based_2nd_exec_ctrl by nested_vmx_setup_ctls_msrs()
> for vmcs_config->nested.secondary_ctls_high.
> 
> The failure can be fixed, simply by adding SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE in
> nested_vmx_setup_ctls_msrs(), e.g.

Assuming KVM actually supports user wait/pause in L2, this is an orthogonal bug
to the CPUID-based manipulation above.  KVM simply neglects to advertise to userspace
that ENABLE_USR_WAIT_PAUSE is supported for nested virtualization.

If KVM doesn't correctly support virtualizing user wait/pause for L2, then the
correct location to fix this is in vmx_secondary_exec_control().

> > > Another question is about the setting of secondary_ctls_high in
> > > nested_vmx_setup_ctls_msrs().  I saw there's a comment saying:
> > > 	"Do not include those that depend on CPUID bits, they are
> > > 	added later by vmx_vcpu_after_set_cpuid.".
> > 
> > That's a stale comment, see the very next commit, 8805875aa473 ("Revert "KVM: nVMX:
> > Do not expose MPX VMX controls when guest MPX disabled""), as well as the slightly
> > later commit 9389d5774aca ("Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
> > VM-{Entry,Exit} control"").
> > 
> 
> So the comment can be and shall be removed, right? 

Yep.

> > > 	if (cpu_has_vmx_vmfunc()) {
> > > 		msrs->secondary_ctls_high |=
> > > 			SECONDARY_EXEC_ENABLE_VMFUNC;
> > 
> > This one is still required.  KVM never enables VMFUNC for itself, i.e. it won't
> > be set in KVM's VMCS configuration.
> > 
> 
> My understanding is that, for VMFUNC, altough KVM does not support it,
> SECONDARY_EXEC_ENABLE_VMFUNC is still set in the secondary proc-based
> vm execution ctrol. KVM just uses different handlers for VMFUNC exits
> from L1(to inject the #UD) and L2(to emulate the eptp switching). So
> it doesn't matter if we do not clear this bit for vmcs_config->nested.
> procbased_ctls_high. 

Ah, you're right, I didn't realize KVM enables VMFUNC in L1.  Enabling VMFUNC for
L1 is silly though, it's trivial to clear the feature in vmx_secondary_exec_control().

That said, enabling VMFUNC in vmcs01 is an orthogonal topic, and it _is_ indeed
easier to keep the feature in the reference config.  Now that the nested config
is derived from the non-nested config, nested_vmx_setup_ctls_msrs() can do:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 61a2e551640a..751cc67aa1a9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6811,7 +6811,8 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
                SECONDARY_EXEC_ENABLE_INVPCID |
                SECONDARY_EXEC_RDSEED_EXITING |
                SECONDARY_EXEC_XSAVES |
-               SECONDARY_EXEC_TSC_SCALING;
+               SECONDARY_EXEC_TSC_SCALING |
+               SECONDARY_EXEC_ENABLE_VMFUNC;
 
        /*
         * We can emulate "VMCS shadowing," even if the hardware
@@ -6842,17 +6843,12 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
                }
        }
 
-       if (cpu_has_vmx_vmfunc()) {
-               msrs->secondary_ctls_high |=
-                       SECONDARY_EXEC_ENABLE_VMFUNC;
-               /*
-                * Advertise EPTP switching unconditionally
-                * since we emulate it
-                */
-               if (enable_ept)
-                       msrs->vmfunc_controls =
-                               VMX_VMFUNC_EPTP_SWITCHING;
-       }
+       /*
+        * Advertise EPTP switching if VMFUNC and EPT are supported, KVM
+        * emulates the actual EPTP switch in software.
+        */
+       if (cpu_has_vmx_vmfunc() && enable_ept)
+               msrs->vmfunc_controls = VMX_VMFUNC_EPTP_SWITCHING;
 
        /*
         * Old versions of KVM use the single-context version without