lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200320180034.203489576@linutronix.de>
Date:   Fri, 20 Mar 2020 19:00:14 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     x86@...nel.org, Paul McKenney <paulmck@...nel.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "Joel Fernandes (Google)" <joel@...lfernandes.org>,
        "Steven Rostedt (VMware)" <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Brian Gerst <brgerst@...il.com>,
        Juergen Gross <jgross@...e.com>,
        Alexandre Chartre <alexandre.chartre@...cle.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>
Subject: [patch V3 18/23] x86/kvm: Move context tracking where it belongs

The invocation of guest_enter_irqoff() is way before the actual VMENTER
happens. guest_exit_irqoff() happens way after the VMEXIT.

While the comment in guest_enter_irqoff() says that KVM does not hold
references to RCU protected data, for the whole call chain before VMENTER
and after VMEXIT there is no guarantee that no RCU protected data is
accessed.

There are tracepoints hidden in MSR access functions and there are calls
into code which takes locks. The latter might cause lockdep to run into RCU
trouble. As the call chains are hard to follow it's unclear whether there
might be RCU trouble lurking in some corner cases.

The VMENTER path after context tracking contains also exit conditions which
abort the VMENTER. The VMEXIT return path reenables interrupts before
switching RCU back on which means that the interrupt entry/exit has to
switch in on and then off again. If tracepoints on local_irq_enable() and
local_irqdisable() are enabled then two more RCU on/off transitions are
happening.

Move it down into the VMX/SVM code close to the actual entry/exit. This is
the first step to bring parts of this code into the .noinstr.text section
so it can be verified with objtool.

Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Cc: Tom Lendacky <thomas.lendacky@....com>
Cc: Paolo Bonzini <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org
---
 arch/x86/kvm/svm.c     |   18 ++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c |   10 ++++++++++
 arch/x86/kvm/x86.c     |    2 --
 3 files changed, 28 insertions(+), 2 deletions(-)

--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5766,6 +5766,15 @@ static void svm_vcpu_run(struct kvm_vcpu
 	 */
 	x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
 
+	/*
+	 * Tell context tracking that this CPU is about to enter guest
+	 * mode. This has to be after x86_spec_ctrl_set_guest() because
+	 * that can take locks (lockdep needs RCU) and calls into world and
+	 * some more.
+	 */
+	guest_enter_irqoff();
+
+	/* This is wrong vs. lockdep. Will be fixed in the next step */
 	local_irq_enable();
 
 	asm volatile (
@@ -5869,6 +5878,14 @@ static void svm_vcpu_run(struct kvm_vcpu
 	loadsegment(gs, svm->host.gs);
 #endif
 #endif
+	/*
+	 * Tell context tracking that this CPU is back.
+	 *
+	 * This needs to be done before the below as native_read_msr()
+	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
+	 * into world and some more.
+	 */
+	guest_exit_irqoff();
 
 	/*
 	 * We do not use IBRS in the kernel. If this vCPU has used the
@@ -5890,6 +5907,7 @@ static void svm_vcpu_run(struct kvm_vcpu
 
 	reload_tss(vcpu);
 
+	/* This is wrong vs. lockdep. Will be fixed in the next step */
 	local_irq_disable();
 
 	x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl);
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6537,6 +6537,11 @@ static void vmx_vcpu_run(struct kvm_vcpu
 	 */
 	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
 
+	/*
+	 * Tell context tracking that this CPU is about to enter guest mode.
+	 */
+	guest_enter_irqoff();
+
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
 	if (static_branch_unlikely(&vmx_l1d_should_flush))
 		vmx_l1d_flush(vcpu);
@@ -6552,6 +6557,11 @@ static void vmx_vcpu_run(struct kvm_vcpu
 	vcpu->arch.cr2 = read_cr2();
 
 	/*
+	 * Tell context tracking that this CPU is back.
+	 */
+	guest_exit_irqoff();
+
+	/*
 	 * We do not use IBRS in the kernel. If this vCPU has used the
 	 * SPEC_CTRL MSR it may have left it on; save the value and
 	 * turn it off. This is much more efficient than blindly adding
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8350,7 +8350,6 @@ static int vcpu_enter_guest(struct kvm_v
 	}
 
 	trace_kvm_entry(vcpu->vcpu_id);
-	guest_enter_irqoff();
 
 	fpregs_assert_state_consistent();
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
@@ -8413,7 +8412,6 @@ static int vcpu_enter_guest(struct kvm_v
 	local_irq_disable();
 	kvm_after_interrupt(vcpu);
 
-	guest_exit_irqoff();
 	if (lapic_in_kernel(vcpu)) {
 		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
 		if (delta != S64_MIN) {

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ