lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250816080854.928605-1-zouyipeng@huawei.com>
Date: Sat, 16 Aug 2025 16:08:54 +0800
From: Yipeng Zou <zouyipeng@...wei.com>
To: <tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
	<dave.hansen@...ux.intel.com>, <x86@...nel.org>, <hpa@...or.com>,
	<rafael.j.wysocki@...el.com>, <zouyipeng@...wei.com>,
	<linux-kernel@...r.kernel.org>
Subject: [PATCH 1/1] x86/smp: ignore stale reboot IPI in crash kernel

Recently, A issue has been reported that CPU hang in x86 VM.

The CPU halted during Kdump likely due to IPI issues when one CPU was
rebooting and another was in Kdump:

CPU0			  CPU2
========================  ======================
reboot			  Panic
machine shutdown	  Kdump
...			  machine_crash_shutdown
local_irq_disable	  local_irq_disable
stop other cpus		  ...
...			  nmi_shootdown_cpus
send_IPIs(REBOOT)  ---->  [IPI pending]
Halt,NMI context   <----  send_IPIs(NMI)
			  ...
			  second kernel start
			  ...
			  local irq enable
			  Halt, IPI context

In simple terms, when the Kdump jump to the second kernel, the IPI that
was pending in the first kernel remains and is responded to by the
second kernel.

For a detailed, please refer to:

[1] https://lore.kernel.org/all/20250604083319.144500-1-zouyipeng@huawei.com/

This scenario can be detected when stopping_cpu contains the bootup value.

Signed-off-by: Yipeng Zou <zouyipeng@...wei.com>
---
 arch/x86/kernel/smp.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index b014e6d229f9..66144f686436 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -136,6 +136,28 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
 DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
 {
 	apic_eoi();
+
+	/*
+	 * Handle the case where a reboot IPI is stale in the IRR. This
+	 * happens when:
+	 *
+	 *   a CPU crashes with interrupts disabled before handling the
+	 *   reboot IPI and jumps into a crash kernel. The reboot IPI
+	 *   vector is kept set in the APIC IRR across the APIC soft
+	 *   disabled phase and as there is no way to clear a pending IRR
+	 *   bit, it is delivered to the crash kernel immediately when
+	 *   interrupts are enabled.
+	 *
+	 * As the reboot IPI can only be sent after acquiring @stopping_cpu
+	 * by storing the CPU number, this case can be detected when
+	 * @stopping_cpu contains the bootup value -1. Just return and
+	 * ignore it.
+	 */
+	if (atomic_read(&stopping_cpu) == -1) {
+		pr_info("Ignoring stale reboot IPI\n");
+		return;
+	}
+
 	cpu_emergency_disable_virtualization();
 	stop_this_cpu(NULL);
 }
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ