lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <MRWPR09MB8022415F2470C23C235459E08F92A@MRWPR09MB8022.eurprd09.prod.outlook.com>
Date: Sun, 25 Jan 2026 19:11:31 +0000
From: Pnina Feder <PNINA.FEDER@...ileye.com>
To: Andrew Morton <akpm@...ux-foundation.org>
CC: "pmladek@...e.com" <pmladek@...e.com>, "bhe@...hat.com" <bhe@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"lkp@...el.com" <lkp@...el.com>, "mgorman@...e.de" <mgorman@...e.de>,
	"mingo@...hat.com" <mingo@...hat.com>, "peterz@...radead.org"
	<peterz@...radead.org>, "rostedt@...dmis.org" <rostedt@...dmis.org>,
	"senozhatsky@...omium.org" <senozhatsky@...omium.org>, "tglx@...utronix.de"
	<tglx@...utronix.de>, Vladimir Kondratiev <Vladimir.Kondratiev@...ileye.com>
Subject: RE: [PATCH v8] panic: add panic_force_cpu= parameter to redirect
 panic to a specific CPU

Hi,
thanks for the review and for queueing this.

> On Thu, 22 Jan 2026 12:24:57 +0200 Pnina Feder <pnina.feder@...ileye.com> wrote:
> 
> > Some platforms require panic handling to execute on a specific CPU for 
> > crash dump to work reliably. This can be due to firmware limitations, 
> > interrupt routing constraints, or platform-specific requirements where 
> > only a single CPU is able to safely enter the crash kernel.
> > 
> > Add the panic_force_cpu= kernel command-line parameter to redirect 
> > panic execution to a designated CPU. When the parameter is provided, 
> > the CPU that initially triggers panic forwards the panic context to 
> > the target CPU via IPI, which then proceeds with the normal panic and kexec flow.
> > 
> > The IPI delivery is implemented as a weak function 
> > (panic_smp_redirect_cpu) so architectures with NMI support can override it for more reliable delivery.
> > 
> > If the specified CPU is invalid, offline, or a panic is already in 
> > progress on another CPU, the redirection is skipped and panic 
> > continues on the current CPU.
> > 
> > ...
> >
> > +
> > +#if defined(CONFIG_SMP) && defined(CONFIG_CRASH_DUMP) static int 
> > +__init panic_force_cpu_setup(char *str) {
> > +	int cpu;
> > +
> > +	if (!str)
> > +		return -EINVAL;
> > +
> > +	if (kstrtoint(str, 0, &cpu) || cpu < 0 || cpu >= nr_cpu_ids) {
> > +		pr_warn("panic_force_cpu: invalid value '%s'\n", str);
> > +		return -EINVAL;
> > +	}
> > +
> > +	panic_force_cpu = cpu;
> > +	return 0;
> > +}
> > +early_param("panic_force_cpu", panic_force_cpu_setup);
> > +
> > +static int __init panic_force_cpu_late_init(void) {
> > +	if (panic_force_cpu < 0)
> > +		return 0;
> > +
> > +	panic_force_buf = kmalloc(PANIC_MSG_BUFSZ, GFP_KERNEL);
> > +
> > +	return 0;
> > +}
> > +late_initcall(panic_force_cpu_late_init);
> 
> early_param vs late_initcall leaves a window where panic_force_cpu!=0&&panic_force_buf==NULL.
> 
> > +static void do_panic_on_target_cpu(void *info) {
> > +	panic("%s", (char *)info);
> > +}
> > +
> >
> > ...
> >
> > +	/*
> > +	 * Only one CPU can do the redirect. Use atomic cmpxchg to ensure
> > +	 * we don't race with another CPU also trying to redirect.
> > +	 */
> > +	if (!atomic_try_cmpxchg(&panic_redirect_cpu, &old_cpu, this_cpu))
> > +		return false;
> > +
> > +	/*
> > +	 * Use dynamically allocated buffer if available, otherwise
> > +	 * fall back to static message for early boot panics or allocation failure.
> > +	 */
> > +	if (panic_force_buf) {
> > +		vsnprintf(panic_force_buf, PANIC_MSG_BUFSZ, fmt, args);
> > +		msg = panic_force_buf;
> > +	} else {
> > +		msg = "Redirected panic (buffer unavailable)";
> > +	}
> 
> which is handled here.   Just showing that I'm paying attention ;)

Yes indeed, thanks for the attention :)

> > +	console_verbose();
> > +	bust_spinlocks(1);
> > +
> > +	pr_emerg("panic: Redirecting from CPU %d to CPU %d for crash kernel.\n",
> > +		 this_cpu, panic_force_cpu);
> > +
> > +	/* Dump original CPU before redirecting */
> > +	if (!test_taint(TAINT_DIE) &&
> > +	    oops_in_progress <= 1 &&
> 
> Well look at that.  When I invented oops_in_progress (dinosaurs were roaming the earth) it was a boolean.  iirc, we didn't have `bool' then.
> 
> Now I see that kdb_msg_write() is playing games and appears to be treating it as a scalar.  Without, of course, documenting that anywhere.
> 
> And then there's this:
> 
> ./kernel/panic.c:	if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
> 
> which I assume is connected to kdb_msg_write()'s games.
> 
> 
> Anyway, it would be great if someone could figure this out and add a description of this new interpretation at the oops_in_progress definition site.
> 
> Also, your test of <= seems inappropriate.  99% of sites treat it as a boolean. </nit>

I inverted the existing oops_in_progress > 1 check from vpanic() - so it's the same logic, just the opposite condition.
I can change it to !(oops_in_progress > 1) if preferred, but <= 1 seemed cleaner.

> > @@ -483,7 +638,11 @@ void vpanic(const char *fmt, va_list args)
> >  	/*
> >  	 * Avoid nested stack-dumping if a panic occurs during oops processing
> >  	 */
> > -	if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
> > +	if (atomic_read(&panic_redirect_cpu) != PANIC_CPU_INVALID &&
> > +	    panic_force_cpu == raw_smp_processor_id()) {
> > +		pr_emerg("panic: Redirected from CPU %d, skipping stack dump.\n",
> > +			 atomic_read(&panic_redirect_cpu));
> 
> No stack dump because it's the wrong stack, right?  Users might wonder where their stack dump went.

Yes, the stack was already dumped on the original CPU before redirecting, so the user should see it a few lines above.
For example:

[   76.974516] panic: Redirecting from CPU 5 to CPU 0 for crash kernel.
[   76.974524] CPU: 5 UID: 0 PID: 370 Comm: sh Kdump: loaded Not tainted 6.18.0 #1 PREEMPT_RT
[   76.974534] Call Trace:
[   76.974538] [<ffffffff8001437e>] dump_backtrace+0x1c/0x24
[   76.974551] [<ffffffff800014be>] show_stack+0x28/0x34
[   76.974557] [<ffffffff8000d3aa>] dump_stack_lvl+0x5e/0x86
[   76.974563] [<ffffffff8000d3e6>] dump_stack+0x14/0x1c
[   76.974568] [<ffffffff80001b08>] vpanic+0x12e/0x40e
[   76.974573] [<ffffffff80001e18>] do_panic_on_target_cpu+0x0/0x16
[   76.974579] [<ffffffff80038024>] get_signal+0x874/0x894
[   76.974588] [<ffffffff80011b5a>] arch_do_signal_or_restart+0x3e/0x922
[   76.974594] [<ffffffff8072c40a>] irqentry_exit_to_user_mode+0x154/0x232
[   76.974603] [<ffffffff8072c534>] irqentry_exit+0x4c/0x92
[   76.974608] [<ffffffff8072bf78>] do_page_fault+0x32/0x48
[   76.974614] [<ffffffff80738ad4>] handle_exception+0x158/0x164
[   76.974626] Kernel panic - not syncing: user process fatal error.
[   76.974630] panic: Redirected from CPU 5, skipping stack dump.

> > +	} else if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
> >  		panic_this_cpu_backtrace_printed = true;
> >  	} else if (IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE)) {
> >  		dump_stack();
> 
> Anyway, Looks Nice To Me.  I'll queue it in mm.git's non-mm branches and shall probably upstream it for 6.19, but additional review is sought, please.

Thanks a lot,
Pnina

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ