[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6794E98B-2B96-4AF6-AF4E-BE15574CA081@nutanix.com>
Date: Mon, 16 May 2022 18:27:22 +0000
From: Jon Kohler <jon@...anix.com>
To: Jon Kohler <jon@...anix.com>
CC: Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Kees Cook <keescook@...omium.org>,
Waiman Long <longman@...hat.com>,
"kvm @ vger . kernel . org" <kvm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] KVM: VMX: do not disable interception for
MSR_IA32_SPEC_CTRL on eIBRS
> On May 12, 2022, at 1:44 PM, Jon Kohler <jon@...anix.com> wrote:
>
> Avoid expensive rdmsr on every VM Exit for MSR_IA32_SPEC_CTRL on
> eIBRS enabled systems iff the guest only sets IA32_SPEC_CTRL[0] (IBRS)
> and not [1] (STIBP) or [2] (SSBD) by not disabling interception in
> the MSR bitmap.
>
> eIBRS enabled guests using just IBRS will only write SPEC_CTRL MSR
> once or twice per vCPU on boot, so it is far better to take those
> VM exits on boot than having to read and save this msr on every
> single VM exit forever. This outcome was suggested on Andrea's commit
> 2f46993d83ff ("x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl")
> however, since interception is still unilaterally disabled, the rdmsr
> tax is still there even after that commit.
>
> This is a significant win for eIBRS enabled systems as this rdmsr
> accounts for roughly ~50% of time for vmx_vcpu_run() as observed
> by perf top disassembly, and is in the critical path for all
> VM-Exits, including fastpath exits.
>
> Update relevant comments in vmx_vcpu_run() with appropriate SDM
> references for future onlookers.
>
Gentle ping on this one
> Fixes: 2f46993d83ff ("x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl")
> Signed-off-by: Jon Kohler <jon@...anix.com>
> Cc: Andrea Arcangeli <aarcange@...hat.com>
> Cc: Kees Cook <keescook@...omium.org>
> Cc: Josh Poimboeuf <jpoimboe@...hat.com>
> Cc: Waiman Long <longman@...hat.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 46 +++++++++++++++++++++++++++++++-----------
> 1 file changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d58b763df855..d9da6fcecd8c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2056,6 +2056,25 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (kvm_spec_ctrl_test_value(data))
> return 1;
>
> + /*
> + * For Intel eIBRS, IBRS (SPEC_CTRL_IBRS aka 0x00000048 BIT(0))
> + * only needs to be set once and can be left on forever without
> + * needing to be constantly toggled. If the guest attempts to
> + * write that value, let's not disable interception. Guests
> + * with eIBRS awareness should only be writing SPEC_CTRL_IBRS
> + * once per vCPU per boot.
> + *
> + * The guest can still use other SPEC_CTRL features on top of
> + * eIBRS such as SSBD, and we should disable interception for
> + * those situations to avoid a multitude of VM-Exits's;
> + * however, we will need to check SPEC_CTRL on each exit to
> + * make sure we restore the host value properly.
> + */
> + if (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) && data == BIT(0)) {
> + vmx->spec_ctrl = data;
> + break;
> + }
> +
> vmx->spec_ctrl = data;
> if (!data)
> break;
> @@ -6887,19 +6906,22 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
> vmx_vcpu_enter_exit(vcpu, vmx);
>
> /*
> - * We do not use IBRS in the kernel. If this vCPU has used the
> - * SPEC_CTRL MSR it may have left it on; save the value and
> - * turn it off. This is much more efficient than blindly adding
> - * it to the atomic save/restore list. Especially as the former
> - * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
> - *
> - * For non-nested case:
> - * If the L01 MSR bitmap does not intercept the MSR, then we need to
> - * save it.
> + * SDM 25.1.3 - handle conditional exit for MSR_IA32_SPEC_CTRL.
> + * To prevent constant VM exits for SPEC_CTRL, kernel may
> + * disable interception in the MSR bitmap for SPEC_CTRL MSR,
> + * such that the guest can read and write to that MSR without
> + * trapping to KVM; however, the guest may set a different
> + * value than the host. For exit handling, do rdmsr below if
> + * interception is disabled, such that we can save the guest
> + * value for restore on VM entry, as it does not get saved
> + * automatically per SDM 27.3.1.
> *
> - * For nested case:
> - * If the L02 MSR bitmap does not intercept the MSR, then we need to
> - * save it.
> + * This behavior is optimized on eIBRS enabled systems, such
> + * that the kernel only disables interception for MSR_IA32_SPEC_CTRL
> + * when guests choose to use additional SPEC_CTRL features
> + * above and beyond IBRS, such as STIBP or SSBD. This
> + * optimization allows the kernel to avoid doing the expensive
> + * rdmsr below.
> */
> if (unlikely(!msr_write_intercepted(vmx, MSR_IA32_SPEC_CTRL)))
> vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
> --
> 2.30.1 (Apple Git-130)
>
Powered by blists - more mailing lists