linux-kernel - Re: [PATCH v8] panic: add panic_force_cpu= parameter to redirect panic to a specific CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20260123171441.282df442e4d0ad9700e89521@linux-foundation.org>
Date: Fri, 23 Jan 2026 17:14:41 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Pnina Feder <pnina.feder@...ileye.com>
Cc: pmladek@...e.com, bhe@...hat.com, linux-kernel@...r.kernel.org,
 lkp@...el.com, mgorman@...e.de, mingo@...hat.com, peterz@...radead.org,
 rostedt@...dmis.org, senozhatsky@...omium.org, tglx@...utronix.de,
 vkondra@...ileye.com
Subject: Re: [PATCH v8] panic: add panic_force_cpu= parameter to redirect
 panic to a specific CPU

On Thu, 22 Jan 2026 12:24:57 +0200 Pnina Feder <pnina.feder@...ileye.com> wrote:

> Some platforms require panic handling to execute on a specific CPU for
> crash dump to work reliably. This can be due to firmware limitations,
> interrupt routing constraints, or platform-specific requirements where
> only a single CPU is able to safely enter the crash kernel.
> 
> Add the panic_force_cpu= kernel command-line parameter to redirect panic
> execution to a designated CPU. When the parameter is provided, the CPU
> that initially triggers panic forwards the panic context to the target
> CPU via IPI, which then proceeds with the normal panic and kexec flow.
> 
> The IPI delivery is implemented as a weak function (panic_smp_redirect_cpu)
> so architectures with NMI support can override it for more reliable delivery.
> 
> If the specified CPU is invalid, offline, or a panic is already in
> progress on another CPU, the redirection is skipped and panic continues
> on the current CPU.
> 
> ...
>
> +
> +#if defined(CONFIG_SMP) && defined(CONFIG_CRASH_DUMP)
> +static int __init panic_force_cpu_setup(char *str)
> +{
> +	int cpu;
> +
> +	if (!str)
> +		return -EINVAL;
> +
> +	if (kstrtoint(str, 0, &cpu) || cpu < 0 || cpu >= nr_cpu_ids) {
> +		pr_warn("panic_force_cpu: invalid value '%s'\n", str);
> +		return -EINVAL;
> +	}
> +
> +	panic_force_cpu = cpu;
> +	return 0;
> +}
> +early_param("panic_force_cpu", panic_force_cpu_setup);
> +
> +static int __init panic_force_cpu_late_init(void)
> +{
> +	if (panic_force_cpu < 0)
> +		return 0;
> +
> +	panic_force_buf = kmalloc(PANIC_MSG_BUFSZ, GFP_KERNEL);
> +
> +	return 0;
> +}
> +late_initcall(panic_force_cpu_late_init);

early_param vs late_initcall leaves a window where
panic_force_cpu!=0&&panic_force_buf==NULL.

> +static void do_panic_on_target_cpu(void *info)
> +{
> +	panic("%s", (char *)info);
> +}
> +
>
> ...
>
> +	/*
> +	 * Only one CPU can do the redirect. Use atomic cmpxchg to ensure
> +	 * we don't race with another CPU also trying to redirect.
> +	 */
> +	if (!atomic_try_cmpxchg(&panic_redirect_cpu, &old_cpu, this_cpu))
> +		return false;
> +
> +	/*
> +	 * Use dynamically allocated buffer if available, otherwise
> +	 * fall back to static message for early boot panics or allocation failure.
> +	 */
> +	if (panic_force_buf) {
> +		vsnprintf(panic_force_buf, PANIC_MSG_BUFSZ, fmt, args);
> +		msg = panic_force_buf;
> +	} else {
> +		msg = "Redirected panic (buffer unavailable)";
> +	}

which is handled here.   Just showing that I'm paying attention ;)

> +	console_verbose();
> +	bust_spinlocks(1);
> +
> +	pr_emerg("panic: Redirecting from CPU %d to CPU %d for crash kernel.\n",
> +		 this_cpu, panic_force_cpu);
> +
> +	/* Dump original CPU before redirecting */
> +	if (!test_taint(TAINT_DIE) &&
> +	    oops_in_progress <= 1 &&

Well look at that.  When I invented oops_in_progress (dinosaurs were
roaming the earth) it was a boolean.  iirc, we didn't have `bool' then.

Now I see that kdb_msg_write() is playing games and appears to be
treating it as a scalar.  Without, of course, documenting that anywhere.

And then there's this:

./kernel/panic.c:	if (test_taint(TAINT_DIE) || oops_in_progress > 1) {

which I assume is connected to kdb_msg_write()'s games.


Anyway, it would be great if someone could figure this out and add a
description of this new interpretation at the oops_in_progress
definition site.

Also, your test of <= seems inappropriate.  99% of sites treat it as a
boolean. </nit>

> @@ -483,7 +638,11 @@ void vpanic(const char *fmt, va_list args)
>  	/*
>  	 * Avoid nested stack-dumping if a panic occurs during oops processing
>  	 */
> -	if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
> +	if (atomic_read(&panic_redirect_cpu) != PANIC_CPU_INVALID &&
> +	    panic_force_cpu == raw_smp_processor_id()) {
> +		pr_emerg("panic: Redirected from CPU %d, skipping stack dump.\n",
> +			 atomic_read(&panic_redirect_cpu));

No stack dump because it's the wrong stack, right?  Users might wonder
where their stack dump went.

> +	} else if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
>  		panic_this_cpu_backtrace_printed = true;
>  	} else if (IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE)) {
>  		dump_stack();

Anyway, Looks Nice To Me.  I'll queue it in mm.git's non-mm branches
and shall probably upstream it for 6.19, but additional review is
sought, please.