lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250604083319.144500-1-zouyipeng@huawei.com>
Date: Wed, 4 Jun 2025 08:33:19 +0000
From: Yipeng Zou <zouyipeng@...wei.com>
To: <tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
	<dave.hansen@...ux.intel.com>, <x86@...nel.org>, <hpa@...or.com>,
	<peterz@...radead.org>, <sohil.mehta@...el.com>, <rui.zhang@...el.com>,
	<arnd@...db.de>, <yuntao.wang@...ux.dev>, <linux-kernel@...r.kernel.org>
CC: <zouyipeng@...wei.com>
Subject: [BUG REPORT] x86/apic: CPU Hang in x86 VM During Kdump

Recently, A issue has been reported that CPU hang in x86 VM.

The CPU halted during Kdump likely due to IPI issues when one CPU was
rebooting and another was in Kdump:

CPU0			  CPU2
========================  ======================
reboot			  Panic
machine shutdown	  Kdump
			  machine shutdown
stop other cpus
			  stop other cpus
...			  ...
local_irq_disable	  local_irq_disable
send_IPIs(REBOOT)	  [critical regions]
[critical regions]	  1) send_IPIs(REBOOT)
			  wait timeout
			  2) send_IPIs(NMI);
Halt,NMI context
			  3) lapic_shutdown [IPI is pending]
			  ...
			  second kernel start
			  4) init_bsp_APIC [IPI is pending]
			  ...
			  local irq enable
			  Halt, IPI context

In simple terms, when the Kdump jump to the second kernel, the IPI that
was pending in the first kernel remains and is responded to by the
second kernel.

I was thinking maybe we need mask IPI in clear_local_APIC() to solve this
problem. In that way, it will clear the pending IPI in both 3) and 4).

I can't seem to find a solution in the SDM manual. I want to ask if this
approach is feasible, or if there are other ways to fix the issue.

Signed-off-by: Yipeng Zou <zouyipeng@...wei.com>
---
 arch/x86/kernel/apic/apic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index d73ba5a7b623..68c41d579303 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1117,6 +1117,8 @@ void clear_local_APIC(void)
 	}
 #endif
 
+	// Mask IPI here
+
 	/*
 	 * Clean APIC state for other OSs:
 	 */
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ