[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220716230952.787452088@linutronix.de>
Date: Sun, 17 Jul 2022 01:17:12 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: LKML <linux-kernel@...r.kernel.org>
Cc: x86@...nel.org, Linus Torvalds <torvalds@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Andrew Cooper <Andrew.Cooper3@...rix.com>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Johannes Wikner <kwikner@...z.ch>,
Alyssa Milburn <alyssa.milburn@...ux.intel.com>,
Jann Horn <jannh@...gle.com>, "H.J. Lu" <hjl.tools@...il.com>,
Joao Moreira <joao.moreira@...el.com>,
Joseph Nuzman <joseph.nuzman@...el.com>,
Steven Rostedt <rostedt@...dmis.org>
Subject: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()
load_percpu_segment() is using wrmsr() which is paravirtualized. That's an
issue because the code sequence is:
__loadsegment_simple(gs, 0);
wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
So anything which uses a per CPU variable between setting GS to 0 and
writing GSBASE is going to end up in a NULL pointer dereference. That's
can be triggered with instrumentation and is guaranteed to be triggered
with callthunks for call depth tracking.
Use native_wrmsrl() instead. XEN_PV will trap and emulate, but that's not a
hot path.
Also make it static and mark it noinstr so neither kprobes, sanitizers or
whatever can touch it.
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
---
arch/x86/include/asm/processor.h | 1 -
arch/x86/kernel/cpu/common.c | 12 ++++++++++--
2 files changed, 10 insertions(+), 3 deletions(-)
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -673,7 +673,6 @@ extern struct desc_ptr early_gdt_descr;
extern void switch_to_new_gdt(int);
extern void load_direct_gdt(int);
extern void load_fixmap_gdt(int);
-extern void load_percpu_segment(int);
extern void cpu_init(void);
extern void cpu_init_secondary(void);
extern void cpu_init_exception_handling(void);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -701,13 +701,21 @@ static const char *table_lookup_model(st
__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
-void load_percpu_segment(int cpu)
+static noinstr void load_percpu_segment(int cpu)
{
#ifdef CONFIG_X86_32
loadsegment(fs, __KERNEL_PERCPU);
#else
__loadsegment_simple(gs, 0);
- wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
+ /*
+ * Because of the __loadsegment_simple(gs, 0) above, any GS-prefixed
+ * instruction will explode right about here. As such, we must not have
+ * any CALL-thunks using per-cpu data.
+ *
+ * Therefore, use native_wrmsrl() and have XenPV take the fault and
+ * emulate.
+ */
+ native_wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
#endif
}
Powered by blists - more mailing lists