lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240123001555.4168188-1-michal.wilczynski@intel.com>
Date: Tue, 23 Jan 2024 02:15:55 +0200
From: Michal Wilczynski <michal.wilczynski@...el.com>
To: seanjc@...gle.com,
	pbonzini@...hat.com,
	mlevitsk@...hat.com
Cc: tglx@...utronix.de,
	mingo@...hat.com,
	bp@...en8.de,
	dave.hansen@...ux.intel.com,
	x86@...nel.org,
	hpa@...or.com,
	kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	dedekind1@...il.com,
	yuan.yao@...el.com,
	Michal Wilczynski <michal.wilczynski@...el.com>,
	Zheyu Ma <zheyuma97@...il.com>
Subject: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction

Syzkaller found a warning triggered in nested_vmx_vmexit().
vmx->nested.nested_run_pending is non-zero, even though we're in
nested_vmx_vmexit(). Generally, trying  to cancel a pending entry is
considered a bug. However in this particular scenario, the kernel
behavior seems correct.

Syzkaller scenario:
1) Set up VCPU's
2) Run some code with KVM_RUN in L2 as a nested guest
3) Return from KVM_RUN
4) Inject KVM_SMI into the VCPU
5) Change the EFER register with KVM_SET_SREGS to value 0x2501
6) Run some code on the VCPU using KVM_RUN
7) Observe following behavior:

kvm_smm_transition: vcpu 0: entering SMM, smbase 0x30000
kvm_entry: vcpu 0, rip 0x8000
kvm_entry: vcpu 0, rip 0x8000
kvm_entry: vcpu 0, rip 0x8002
kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x30000
kvm_nested_vmenter: rip: 0x0000000000008002 vmcs: 0x0000000000007000
                    nested_rip: 0x0000000000000000 int_ctl: 0x00000000
		    event_inj: 0x00000000 nested_ept=n guest
		    cr3: 0x0000000000002000
kvm_nested_vmexit_inject: reason: TRIPLE_FAULT ext_inf1: 0x0000000000000000
                          ext_inf2: 0x0000000000000000 ext_int: 0x00000000
			  ext_int_err: 0x00000000

What happened here is an SMI was injected immediately and the handler was
called at address 0x8000; all is good. Later, an RSM instruction is
executed in an emulator to return to the nested VM. em_rsm() is called,
which leads to emulator_leave_smm(). A part of this function calls VMX/SVM
callback, in this case vmx_leave_smm(). It attempts to set up a pending
reentry to guest VM by calling nested_vmx_enter_non_root_mode() and sets
vmx->nested.nested_run_pending to one. Unfortunately, later in
emulator_leave_smm(), rsm_load_state_64() fails to write invalid EFER to
the MSR. This results in em_rsm() calling triple_fault callback. At this
point it's clear that the KVM should call the vmexit, but
vmx->nested.nested_run_pending is left set to 1.

Similar flow goes for SVM, as the bug also reproduces on AMD platforms.

To address this issue, reset the nested_run_pending flag in the
triple_fault handler. However, it's crucial to note that
nested_pending_run cannot be cleared in all cases. It should only be
cleared for the specific instruction requiring hardware VM-Enter to
complete the emulation, such as RSM. Previously, there were instances
where KVM prematurely synthesized a triple fault on nested VM-Enter. In
these cases, it is not appropriate to zero the nested_pending_run.

To resolve this, introduce a new emulator flag indicating the need for
HW VM-Enter to complete emulating RSM. Based on this flag, a decision can
be made in vendor-specific triple fault handlers about whether
nested_pending_run needs to be cleared.

Fixes: 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM")
Reported-by: Zheyu Ma <zheyuma97@...il.com>
Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@mail.gmail.com
Signed-off-by: Michal Wilczynski <michal.wilczynski@...el.com>
---
v2:
 - added new emulator flags indicating whether an instruction needs a
   VM-Enter to complete emulation (Sean)
 - fix in SVM nested triple_fault handler (Sean)
 - only clear nested_run_pending on RSM instruction (Sean)

 arch/x86/kvm/emulate.c     |  4 +++-
 arch/x86/kvm/kvm_emulate.h |  2 ++
 arch/x86/kvm/svm/nested.c  |  9 +++++++++
 arch/x86/kvm/vmx/nested.c  | 12 ++++++++++++
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e223043ef5b2..889460432eac 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -178,6 +178,7 @@
 #define IncSP       ((u64)1 << 54)  /* SP is incremented before ModRM calc */
 #define TwoMemOp    ((u64)1 << 55)  /* Instruction has two memory operand */
 #define IsBranch    ((u64)1 << 56)  /* Instruction is considered a branch. */
+#define NeedVMEnter ((u64)1 << 57)  /* Instruction needs HW VM-Enter to complete */
 
 #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -4462,7 +4463,7 @@ static const struct opcode twobyte_table[256] = {
 	F(DstMem | SrcReg | Src2CL | ModRM, em_shld), N, N,
 	/* 0xA8 - 0xAF */
 	I(Stack | Src2GS, em_push_sreg), I(Stack | Src2GS, em_pop_sreg),
-	II(EmulateOnUD | ImplicitOps, em_rsm, rsm),
+	II(EmulateOnUD | ImplicitOps | NeedVMEnter, em_rsm, rsm),
 	F(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_bts),
 	F(DstMem | SrcReg | Src2ImmByte | ModRM, em_shrd),
 	F(DstMem | SrcReg | Src2CL | ModRM, em_shrd),
@@ -4966,6 +4967,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 	}
 
 	ctxt->is_branch = opcode.flags & IsBranch;
+	ctxt->need_vm_enter = opcode.flags & NeedVMEnter;
 
 	/* Unrecognised? */
 	if (ctxt->d == 0)
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index e6d149825169..1e1366afa51d 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -372,6 +372,8 @@ struct x86_emulate_ctxt {
 	struct read_cache io_read;
 	struct read_cache mem_read;
 	bool is_branch;
+	/* instruction need a HW VM-Enter to complete correctly */
+	bool need_vm_enter;
 };
 
 #define KVM_EMULATOR_BUG_ON(cond, ctxt)		\
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dee62362a360..8c19ad5e18d4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1165,11 +1165,20 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 static void nested_svm_triple_fault(struct kvm_vcpu *vcpu)
 {
+	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	if (!vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_SHUTDOWN))
 		return;
 
+	/*
+	 * In case of a triple fault, cancel the nested reentry. This may occur
+	 * when the RSM instruction fails while attempting to restore the state
+	 * from SMRAM.
+	 */
+	if (ctxt->need_vm_enter)
+		svm->nested.nested_run_pending = 0;
+
 	kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 	nested_svm_simple_vmexit(to_svm(vcpu), SVM_EXIT_SHUTDOWN);
 }
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6329a306856b..9228699b4c1e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4950,7 +4950,19 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 
 static void nested_vmx_triple_fault(struct kvm_vcpu *vcpu)
 {
+	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
 	kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
+	/*
+	 * In case of a triple fault, cancel the nested reentry. This may occur
+	 * when the RSM instruction fails while attempting to restore the state
+	 * from SMRAM.
+	 */
+	if (ctxt->need_vm_enter)
+		vmx->nested.nested_run_pending = 0;
+
 	nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0);
 }
 
-- 
2.41.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ