[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87ecu1pfnn.ffs@tglx>
Date: Sun, 27 Jul 2025 22:01:00 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Yipeng Zou <zouyipeng@...wei.com>, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
peterz@...radead.org, sohil.mehta@...el.com, rui.zhang@...el.com,
arnd@...db.de, yuntao.wang@...ux.dev, linux-kernel@...r.kernel.org
Cc: zouyipeng@...wei.com
Subject: Re: [BUG REPORT] x86/apic: CPU Hang in x86 VM During Kdump
On Wed, Jun 04 2025 at 08:33, Yipeng Zou wrote:
> Recently, A issue has been reported that CPU hang in x86 VM.
>
> The CPU halted during Kdump likely due to IPI issues when one CPU was
> rebooting and another was in Kdump:
>
> CPU0 CPU2
> ======================== ======================
> reboot Panic
> machine shutdown Kdump
> machine shutdown
> stop other cpus
> stop other cpus
> ... ...
> local_irq_disable local_irq_disable
> send_IPIs(REBOOT) [critical regions]
> [critical regions] 1) send_IPIs(REBOOT)
After staring more at it, this makes absolutely no sense at all.
stop_other_cpus() does:
/* Only proceed if this is the first CPU to reach this code */
old_cpu = -1;
this_cpu = smp_processor_id();
if (!atomic_try_cmpxchg(&stopping_cpu, &old_cpu, this_cpu))
return;
So CPU2 _cannot_ reach the code, which issues the reboot IPIs, because
at that point @stopping_cpu == 0 ergo the cmpxchg() fails.
So what actually happens in this case is:
CPU0 CPU2
======================== ======================
reboot Panic
machine shutdown Kdump
machine_crash_shutdown()
stop other cpus local_irq_disable()
try_cmpxchg() succeeds stop other cpus
... try_cmpxchg() fails
send_IPIs(REBOOT) --> REBOOT vector becomes pending in IRR
wait timeout
And from there on everything becomes a lottery as CPU0 continues to
execute and CPU2 proceeds and jumps into the crash kernel...
This whole logic is broken...
Nevertheless the patch I sent earlier is definitely making things more
robust, but it won't solve your particular problem.
Thanks,
tglx
Powered by blists - more mailing lists