linux-kernel - Re: [PATCH v3 2/2] VMX: reset the segment cache after segment initialization in vmx_vcpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZrF55uIvX2rcHtSW@chao-email>
Date: Tue, 6 Aug 2024 09:18:30 +0800
From: Chao Gao <chao.gao@...el.com>
To: Maxim Levitsky <mlevitsk@...hat.com>
CC: <kvm@...r.kernel.org>, Sean Christopherson <seanjc@...gle.com>, "Dave
 Hansen" <dave.hansen@...ux.intel.com>, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, "H. Peter
 Anvin" <hpa@...or.com>, <linux-kernel@...r.kernel.org>, Paolo Bonzini
	<pbonzini@...hat.com>, <x86@...nel.org>
Subject: Re: [PATCH v3 2/2] VMX: reset the segment cache after segment
 initialization in vmx_vcpu_reset

On Thu, Jul 25, 2024 at 01:52:32PM -0400, Maxim Levitsky wrote:

KVM: VMX:

>reset the segment cache after segment initialization in vmx_vcpu_reset
>to avoid stale uninitialized data being cached in the segment cache.
>
>In particular the following scenario is possible when full preemption
>is enabled:
>
>- vCPU is just created, and the vCPU thread is preempted before SS.AR_BYTES
>is written in vmx_vcpu_reset.
>
>- During preemption, the kvm_arch_vcpu_in_kernel is called which
>reads SS's segment AR byte to determine if the CPU was in the kernel.
>
>That caches 0 value of SS.AR_BYTES, then eventually the vCPU thread will be
>preempted back, then set the correct SS.AR_BYTES value in the vmcs
>and the cached value will remain stale, and could be read e.g via
>KVM_GET_SREGS.
>
>Usually this is not a problem because VMX segment cache is reset on each
>vCPU run, but if the userspace (e.g KVM selftests do) reads the segment
>registers just after the vCPU was created, and modifies some of them
>but passes through other registers and in this case SS.AR_BYTES,
>the stale value of it will make it into the vmcs,
>and later lead to a VM entry failure due to incorrect SS segment type.

I looked into the same issue last week, which was reported by someone
internally.

>
>Fix this by moving the vmx_segment_cache_clear() call to be after the
>segments are initialized.
>
>Note that this still doesn't fix the issue of kvm_arch_vcpu_in_kernel
>getting stale data during the segment setup, and that issue will
>be addressed later.
>
>Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>

Do you need a Fixes tag and/or Cc: stable?

>---
> arch/x86/kvm/vmx/vmx.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index fa9f307d9b18..d43bb755e15c 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -4870,9 +4870,6 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> 	vmx->hv_deadline_tsc = -1;
> 	kvm_set_cr8(vcpu, 0);
> 
>-	vmx_segment_cache_clear(vmx);
>-	kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS);
>-
> 	seg_setup(VCPU_SREG_CS);
> 	vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
> 	vmcs_writel(GUEST_CS_BASE, 0xffff0000ul);
>@@ -4899,6 +4896,9 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> 	vmcs_writel(GUEST_IDTR_BASE, 0);
> 	vmcs_write32(GUEST_IDTR_LIMIT, 0xffff);
> 
>+	vmx_segment_cache_clear(vmx);
>+	kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS);

vmx_segment_cache_clear() is called in a few other sites. I think at least the
call in __vmx_set_segment() should be fixed, because QEMU may read SS.AR right
after a write to it. if the write was preempted after the cache was cleared but
before the new value being written into VMCS, QEMU would find that SS.AR held a
stale value.

>+
> 	vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
> 	vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0);
> 	vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0);
>-- 
>2.26.3
>
>