lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h5yxq02p.ffs@tglx>
Date: Sun, 27 Jul 2025 14:39:58 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Yipeng Zou <zouyipeng@...wei.com>, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
 peterz@...radead.org, sohil.mehta@...el.com, rui.zhang@...el.com,
 arnd@...db.de, yuntao.wang@...ux.dev, linux-kernel@...r.kernel.org
Subject: Re: [BUG REPORT] x86/apic: CPU Hang in x86 VM During Kdump

On Sat, Jul 26 2025 at 17:50, Yipeng Zou wrote:

Please do not top-post and trim your replies.

>      I skipped sending the NMI in native_stop_other_cpus(), and the test 
> passed.

I don't see how that would result in anything meaningful. The reboot vector
IRR bit on that second CPU will still be set.

>      Given this, is there an alternative way to resolve the issue, or 
> can we simply mask the IPI directly at that point?

Good luck for finding a mask register in the local APIC.

Even if there would be a mask register, then the IRR bit still would be
there and on unmask delivered. There is no way to clear IRR bits other
than a full reset (power on or INIT/SIPI sequence) of the local APIC.

In theory the APIC can be reset by clearing the enable bit in the
APIC_BASE MSR, but that's a can of worms in itself.

The Intel SDM is very blury about the behaviour:

  When IA32_APIC_BASE[11] is set to 0, prior initialization to the APIC
  may be lost and the APIC may return to the state described in Section
  11.4.7.1, “Local APIC State After Power-Up or Reset.”

"may" means there is no guarantee.

Aside of that this cannot be done for the original 3-wire APIC bus based
APICs (32-bit museum) pieces. Not that I care much about them, but
that's just going to add more complexity to the existing horrors.

The other problem is that with the bit disabled, the APIC might not
respond to INIT/SIPI anymore, but that's equally unclear from the
documentation; both Intel and AMD manuals are pretty useless when it
comes to the gory details of the APIC and from past experience I know
that there are quite some subtle differences in the APIC behaviour
across CPU generations...

The stale reboot vector IRR problem is pretty straight forward to
mitigate. See patch below.

That needs a full audit of the various vectors, though at a quick
inspection most of them should be fine.

Aside of that there is quite some bogosity in the APIC setup path, which
I need to look deeper into.

Thanks,

	tglx
---
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -136,6 +136,28 @@ static int smp_stop_nmi_callback(unsigne
 DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
 {
 	apic_eoi();
+
+	/*
+	 * Handle the case where a reboot IPI is stale in the IRR. This
+	 * happens when:
+	 *
+	 *   a CPU crashes with interrupts disabled before handling the
+	 *   reboot IPI and jumps into a crash kernel. The reboot IPI
+	 *   vector is kept set in the APIC IRR across the APIC soft
+	 *   disabled phase and as there is no way to clear a pending IRR
+	 *   bit, it is delivered to the crash kernel immediately when
+	 *   interrupts are enabled.
+	 *
+	 * As the reboot IPI can only be sent after acquiring @stopping_cpu
+	 * by storing the CPU number, this case can be detected when
+	 * @stopping_cpu contains the bootup value -1. Just return and
+	 * ignore it.
+	 */
+	if (atomic_read(&stopping_cpu) == -1) {
+		pr_info("Ignoring stale reboot IPI\n");
+		return;
+	}
+
 	cpu_emergency_disable_virtualization();
 	stop_this_cpu(NULL);
 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ