linux-kernel - Re: [PATCH 1/2] KVM: x86: Defer IBPBs for vCPU and nested transitions until core run loop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <lgsgu5rxioy3zt67i6envq45mdvg2y3kxh26aw6qsqeqogbyho@al24pkui5lt2>
Date: Fri, 30 Jan 2026 18:21:37 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Jim Mattson <jmattson@...gle.com>, 
	David Kaplan <david.kaplan@....com>
Subject: Re: [PATCH 1/2] KVM: x86: Defer IBPBs for vCPU and nested
 transitions until core run loop

On Tue, Jan 27, 2026 at 05:34:31PM -0800, Sean Christopherson wrote:
> When emitting an Indirect Branch Prediction Barrier to isolate different
> guest security domains (different vCPUs or L1 vs. L2 in the same vCPU),
> defer the IBPB until VM-Enter is imminent to avoid redundant and/or
> unnecessary IBPBs.  E.g. if a vCPU is loaded on a CPU without ever doing
> VM-Enter, then _KVM_ isn't responsible for doing an IBPB as KVM's job is
> purely to mitigate guests<=>guest attacks; guest=>host attacks are covered
> by IBRS.
> 
> Cc: stable@...r.kernel.org
> Cc: Yosry Ahmed <yosry.ahmed@...ux.dev>
> Cc: Jim Mattson <jmattson@...gle.com>
> Cc: David Kaplan <david.kaplan@....com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/x86.c              | 7 ++++++-
>  arch/x86/kvm/x86.h              | 2 +-
>  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e441f270f354..76bbc80a2d1d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -826,6 +826,7 @@ struct kvm_vcpu_arch {
>  	u64 smbase;
>  	u64 smi_count;
>  	bool at_instruction_boundary;
> +	bool need_ibpb;

We have IBPB_ON_VMEXIT, so this is a bit confusing, the reader could
assume this is an optimization for IBPB_ON_VMEXIT.

Maybe need_ibpb_on_vmenter or need_ibpb_on_run?

>  	bool tpr_access_reporting;
>  	bool xfd_no_write_intercept;
>  	u64 microcode_version;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8acfdfc583a1..e5ae655702b4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5187,7 +5187,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  		 * is handled on the nested VM-Exit path.
>  		 */
>  		if (static_branch_likely(&switch_vcpu_ibpb))
> -			indirect_branch_prediction_barrier();
> +			vcpu->arch.need_ibpb = true;

This means that if we run vCPU A on a pCPU, then load vCPU B without
running it, we won't do an IBPB. At least not until vCPU B (or any vCPU)
actually runs on the pCPU.

My question would be, is it possible for training done by vCPU A to lead
KVM into leaking some of vCPU B's state after loading it, even without
actually running vCPU B?

Basically this scenario (all on the same pCPU):

1. Malicious vCPU A runs and injects branch predictor entries.

2. KVM loads vCPU B to perform some action without actually running
   vCPU B. The training from (1) leaks some of vCPU B's state into the
   microarchitectural state.

3. Malicious vCPU A runs again and extracts the leaked data.

>  		per_cpu(last_vcpu, cpu) = vcpu;
>  	}
>  
> @@ -11315,6 +11315,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
>  	}
>  
> +	if (unlikely(vcpu->arch.need_ibpb)) {
> +		indirect_branch_prediction_barrier();
> +		vcpu->arch.need_ibpb = false;
> +	}
> +
>  	fpregs_assert_state_consistent();
>  	if (test_thread_flag(TIF_NEED_FPU_LOAD))
>  		switch_fpu_return();
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 70e81f008030..6708142d051d 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -169,7 +169,7 @@ static inline void kvm_nested_vmexit_handle_ibrs(struct kvm_vcpu *vcpu)
>  
>  	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
>  	    guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBRS))
> -		indirect_branch_prediction_barrier();
> +		vcpu->arch.need_ibpb = true;

I think the same question more-or-less applies here, could an L2 guest
lead KVM to leak some of L1's state before L1 is actually run? Although
in this case it could be harder for any leaked state to survive KVM
running L1 and then going back to L2.

>  }
>  
>  /*
> -- 
> 2.52.0.457.g6b5491de43-goog
>