linux-kernel - Re: [PATCH 3/3] KVM: x86: Generalize IBRS virtualization on emulated VM-exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALMp9eSPVDYC7v4Rm13ZUcE4wWPb8dUfm=qBx_jETAQEQrt4_w@mail.gmail.com>
Date: Fri, 21 Feb 2025 09:59:04 -0800
From: Jim Mattson <jmattson@...gle.com>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: x86@...nel.org, Sean Christopherson <seanjc@...gle.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, 
	Paolo Bonzini <pbonzini@...hat.com>, "Kaplan, David" <David.Kaplan@....com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] KVM: x86: Generalize IBRS virtualization on emulated VM-exit

On Fri, Feb 21, 2025 at 8:34 AM Yosry Ahmed <yosry.ahmed@...ux.dev> wrote:
>
> Commit 2e7eab81425a ("KVM: VMX: Execute IBPB on emulated VM-exit when
> guest has IBRS") added an IBPB in the emulated VM-exit path on Intel to
> properly virtualize IBRS by providing separate predictor modes for L1
> and L2.
>
> AMD requires similar handling, except when IbrsSameMode is enumerated by
> the host CPU (which is the case on most/all AMD CPUs). With
> IbrsSameMode, hardware IBRS is sufficient and no extra handling is
> needed from KVM.
>
> Generalize the handling in nested_vmx_vmexit() by moving it into a
> generic function, add the AMD handling, and use it in
> nested_svm_vmexit() too. The main reason for using a generic function is
> to have a single place to park the huge comment about virtualizing IBRS.
>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
> ---
>  arch/x86/kvm/svm/nested.c |  2 ++
>  arch/x86/kvm/vmx/nested.c | 11 +----------
>  arch/x86/kvm/x86.h        | 18 ++++++++++++++++++
>  3 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index d77b094d9a4d6..61b73ff30807e 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1041,6 +1041,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>
>         nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr);
>
> +       kvm_nested_vmexit_handle_spec_ctrl(vcpu);
> +
>         svm_switch_vmcb(svm, &svm->vmcb01);
>
>         /*
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 8a7af02d466e9..453d52a6e836a 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -5018,16 +5018,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>
>         vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>
> -       /*
> -        * If IBRS is advertised to the vCPU, KVM must flush the indirect
> -        * branch predictors when transitioning from L2 to L1, as L1 expects
> -        * hardware (KVM in this case) to provide separate predictor modes.
> -        * Bare metal isolates VMX root (host) from VMX non-root (guest), but
> -        * doesn't isolate different VMCSs, i.e. in this case, doesn't provide
> -        * separate modes for L2 vs L1.
> -        */
> -       if (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL))
> -               indirect_branch_prediction_barrier();
> +       kvm_nested_vmexit_handle_spec_ctrl(vcpu);
>
>         /* Update any VMCS fields that might have changed while L2 ran */
>         vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 7a87c5fc57f1b..008c8d381c253 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -116,6 +116,24 @@ static inline void kvm_leave_nested(struct kvm_vcpu *vcpu)
>         kvm_x86_ops.nested_ops->leave_nested(vcpu);
>  }
>
> +/*
> + * If IBRS is advertised to the vCPU, KVM must flush the indirect branch
> + * predictors when transitioning from L2 to L1, as L1 expects hardware (KVM in
> + * this case) to provide separate predictor modes.  Bare metal isolates the host
> + * from the guest, but doesn't isolate different guests from one another (in
> + * this case L1 and L2). The exception is if bare metal supports same mode IBRS,
> + * which offers protection within the same mode, and hence protects L1 from L2.
> + */
> +static inline void kvm_nested_vmexit_handle_spec_ctrl(struct kvm_vcpu *vcpu)

Maybe just kvm_nested_vmexit_handle_ibrs?

> +{
> +       if (cpu_feature_enabled(X86_FEATURE_AMD_IBRS_SAME_MODE))
> +               return;
> +
> +       if (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
> +           guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBRS))

This is a bit conservative, but I don't think there's any ROI in being
more pedantic.

For the series,

Reviewed-by: Jim Mattson <jmattson@...gle.com>

> +               indirect_branch_prediction_barrier();
> +}
> +
>  static inline bool kvm_vcpu_has_run(struct kvm_vcpu *vcpu)
>  {
>         return vcpu->arch.last_vmentry_cpu != -1;
> --
> 2.48.1.601.g30ceb7b040-goog
>