linux-kernel - Re: [PATCH] KVM: nSVM: Use vcpu->arch.cr2 when updating vmcb12 on nested #VMEXIT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cw6rwvvu57bt7i4pi3exmw6tdmbevegvlitqlmaycughua5sgn@4qdxkte6yxcz>
Date: Tue, 3 Feb 2026 15:33:00 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: nSVM: Use vcpu->arch.cr2 when updating vmcb12 on
 nested #VMEXIT

On Tue, Feb 03, 2026 at 01:13:20AM +0000, Yosry Ahmed wrote:
> KVM currently uses the value of CR2 from vmcb02 to update vmcb12 on
> nested #VMEXIT. Use the value from vcpu->arch.cr2 instead.
> 
> The value in vcpu->arch.cr2 is sync'd to vmcb02 shortly before a VMRUN
> of L2, and sync'd back to vcpu->arch.cr2 shortly after. The value are
> only out-of-sync in two cases: after migration, and after a #PF is
> injected into L2.
> 
> After migration, the value of CR2 in vmcb02 is uninitialized (i.e.
> zero), as KVM_SET_SREGS restores CR2 value to vcpu->arch.cr2. Using
> vcpu->arch.cr2 to update vmcb12 is the right thing to do.
> 
> The #PF injection case is more nuanced. It occurs if KVM injects a #PF
> into L2, then exits to L1 before it actually runs L2. Although the APM
> is a bit unclear about when CR2 is written during a #PF, the SDM is more
> clear:
> 
> 	Processors update CR2 whenever a page fault is detected. If a
> 	second page fault occurs while an earlier page fault is being
> 	delivered, the faulting linear address of the second fault will
> 	overwrite the contents of CR2 (replacing the previous address).
> 	These updates to CR2 occur even if the page fault results in a
> 	double fault or occurs during the delivery of a double fault.
> 
> KVM injecting the exception surely counts as the #PF being "detected".
> More importantly, when an exception is injected into L2 at the time of a
> synthesized #VMEXIT, KVM updates exit_int_info in vmcb12 accordingly,
> such that an L1 hypervisor can re-inject the exception. If CR2 is not
> written at that point, the L1 hypervisor have no way of correctly
> re-injecting the #PF. Hence, using vcpu->arch.cr2 is also the right
> thing to write in vmcb12 in this case.
> 
> Note that KVM does _not_ update vcpu->arch.cr2 when a #PF is pending for
> L2, only when it is injected. The distinction is important, because only
> injected exceptions are propagated to L1 through exit_int_info. It would
> be incorrect to update CR2 in vmcb12 for a pending #PF, as L1 would
> perceive an updated CR2 value with no #PF. Update the comment in
> kvm_deliver_exception_payload() to clarify this.

I forgot the best part:

If a synthesized #VMEXIT to L1 writes the wrong CR2 (e.g. right after
migration), and L2 is handling a #PF, it could read a corrupted CR2.
This could manifest as segmentation faults in L2, or potentially data
corruption.

Cc: stable@...r.kernel.org

> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
> ---
>  arch/x86/kvm/svm/nested.c | 2 +-
>  arch/x86/kvm/x86.c        | 7 +++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index de90b104a0dd5..9031746ce2db1 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1156,7 +1156,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>  	vmcb12->save.efer   = svm->vcpu.arch.efer;
>  	vmcb12->save.cr0    = kvm_read_cr0(vcpu);
>  	vmcb12->save.cr3    = kvm_read_cr3(vcpu);
> -	vmcb12->save.cr2    = vmcb02->save.cr2;
> +	vmcb12->save.cr2    = vcpu->arch.cr2;
>  	vmcb12->save.cr4    = svm->vcpu.arch.cr4;
>  	vmcb12->save.rflags = kvm_get_rflags(vcpu);
>  	vmcb12->save.rip    = kvm_rip_read(vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index db3f393192d94..1015522d0fbd7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -864,6 +864,13 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, unsigned int nr,
>  		vcpu->arch.exception.error_code = error_code;
>  		vcpu->arch.exception.has_payload = has_payload;
>  		vcpu->arch.exception.payload = payload;
> +		/*
> +		 * Only injected exceptions are propagated to L1 in
> +		 * vmcb12/vmcs12 on nested #VMEXIT. Hence, do not deliver the
> +		 * exception payload for L2 until the exception is injected.
> +		 * Otherwise, L1 would perceive the updated payload without a
> +		 * corresponding exception.
> +		 */
>  		if (!is_guest_mode(vcpu))
>  			kvm_deliver_exception_payload(vcpu,
>  						      &vcpu->arch.exception);
> -- 
> 2.53.0.rc2.204.g2597b5adb4-goog
>