[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230613115353.599087484@linutronix.de>
Date: Tue, 13 Jun 2023 14:17:54 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: LKML <linux-kernel@...r.kernel.org>
Cc: x86@...nel.org, Mario Limonciello <mario.limonciello@....com>,
Tom Lendacky <thomas.lendacky@....com>,
Tony Battersby <tonyb@...ernetics.com>,
Ashok Raj <ashok.raj@...ux.intel.com>,
Tony Luck <tony.luck@...el.com>,
Arjan van de Veen <arjan@...ux.intel.com>,
Eric Biederman <ebiederm@...ssion.com>
Subject: [patch V2 0/8] x86/smp: Cure stop_other_cpus() and kexec() troubles
This is the second version of the kexec() vs. mwait_play_dead()
series. Version 1 can be found here:
https://lore.kernel.org/r/20230603193439.502645149@linutronix.de
Aside of picking up the correction of the original patch 5 this also
integrates a fix for intermittend reboot hangs reported by Tony:
https://lore.kernel.org/r/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com
which touches the same area. While halfways independent I added them here
as these changes conflict nicely.
So the two issues are:
1) stop_other_cpus() continues after observing num_online_cpus() == 1.
This is problematic because the to be stopped CPUs clear their online
bit first and then invoke eventually WBINVD, which can take a long
time. There seems to be an interaction between the WBINVD and the
reboot mechanics as this intermittendly results in hangs.
2) kexec() kernel can overwrite the memory locations which "offline" CPUs
are monitoring. This write brings them out of MWAIT and they resume
execution on overwritten text, page tables, data and stacks resulting
in triple faults.
Cure them by:
#1 Synchronizing stop_other_cpus() with an atomic variable which is
decremented in stop_this_cpu() _after_ WBINVD completes.
#2 Bringing offline CPUs out of MWAIT and move them into HLT before
starting the kexec() kernel. Optionaly send them an INIT IPI so they
go back into wait for startup state.
The series is also available from git:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/kexec
Thanks,
tglx
---
include/asm/cpu.h | 2
include/asm/smp.h | 4 +
kernel/process.c | 16 +++++
kernel/smp.c | 79 ++++++++++++++++++----------
kernel/smpboot.c | 149 ++++++++++++++++++++++++++++++++++++++++--------------
5 files changed, 183 insertions(+), 67 deletions(-)
Powered by blists - more mailing lists