[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20260123171441.282df442e4d0ad9700e89521@linux-foundation.org>
Date: Fri, 23 Jan 2026 17:14:41 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Pnina Feder <pnina.feder@...ileye.com>
Cc: pmladek@...e.com, bhe@...hat.com, linux-kernel@...r.kernel.org,
lkp@...el.com, mgorman@...e.de, mingo@...hat.com, peterz@...radead.org,
rostedt@...dmis.org, senozhatsky@...omium.org, tglx@...utronix.de,
vkondra@...ileye.com
Subject: Re: [PATCH v8] panic: add panic_force_cpu= parameter to redirect
panic to a specific CPU
On Thu, 22 Jan 2026 12:24:57 +0200 Pnina Feder <pnina.feder@...ileye.com> wrote:
> Some platforms require panic handling to execute on a specific CPU for
> crash dump to work reliably. This can be due to firmware limitations,
> interrupt routing constraints, or platform-specific requirements where
> only a single CPU is able to safely enter the crash kernel.
>
> Add the panic_force_cpu= kernel command-line parameter to redirect panic
> execution to a designated CPU. When the parameter is provided, the CPU
> that initially triggers panic forwards the panic context to the target
> CPU via IPI, which then proceeds with the normal panic and kexec flow.
>
> The IPI delivery is implemented as a weak function (panic_smp_redirect_cpu)
> so architectures with NMI support can override it for more reliable delivery.
>
> If the specified CPU is invalid, offline, or a panic is already in
> progress on another CPU, the redirection is skipped and panic continues
> on the current CPU.
>
> ...
>
> +
> +#if defined(CONFIG_SMP) && defined(CONFIG_CRASH_DUMP)
> +static int __init panic_force_cpu_setup(char *str)
> +{
> + int cpu;
> +
> + if (!str)
> + return -EINVAL;
> +
> + if (kstrtoint(str, 0, &cpu) || cpu < 0 || cpu >= nr_cpu_ids) {
> + pr_warn("panic_force_cpu: invalid value '%s'\n", str);
> + return -EINVAL;
> + }
> +
> + panic_force_cpu = cpu;
> + return 0;
> +}
> +early_param("panic_force_cpu", panic_force_cpu_setup);
> +
> +static int __init panic_force_cpu_late_init(void)
> +{
> + if (panic_force_cpu < 0)
> + return 0;
> +
> + panic_force_buf = kmalloc(PANIC_MSG_BUFSZ, GFP_KERNEL);
> +
> + return 0;
> +}
> +late_initcall(panic_force_cpu_late_init);
early_param vs late_initcall leaves a window where
panic_force_cpu!=0&&panic_force_buf==NULL.
> +static void do_panic_on_target_cpu(void *info)
> +{
> + panic("%s", (char *)info);
> +}
> +
>
> ...
>
> + /*
> + * Only one CPU can do the redirect. Use atomic cmpxchg to ensure
> + * we don't race with another CPU also trying to redirect.
> + */
> + if (!atomic_try_cmpxchg(&panic_redirect_cpu, &old_cpu, this_cpu))
> + return false;
> +
> + /*
> + * Use dynamically allocated buffer if available, otherwise
> + * fall back to static message for early boot panics or allocation failure.
> + */
> + if (panic_force_buf) {
> + vsnprintf(panic_force_buf, PANIC_MSG_BUFSZ, fmt, args);
> + msg = panic_force_buf;
> + } else {
> + msg = "Redirected panic (buffer unavailable)";
> + }
which is handled here. Just showing that I'm paying attention ;)
> + console_verbose();
> + bust_spinlocks(1);
> +
> + pr_emerg("panic: Redirecting from CPU %d to CPU %d for crash kernel.\n",
> + this_cpu, panic_force_cpu);
> +
> + /* Dump original CPU before redirecting */
> + if (!test_taint(TAINT_DIE) &&
> + oops_in_progress <= 1 &&
Well look at that. When I invented oops_in_progress (dinosaurs were
roaming the earth) it was a boolean. iirc, we didn't have `bool' then.
Now I see that kdb_msg_write() is playing games and appears to be
treating it as a scalar. Without, of course, documenting that anywhere.
And then there's this:
./kernel/panic.c: if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
which I assume is connected to kdb_msg_write()'s games.
Anyway, it would be great if someone could figure this out and add a
description of this new interpretation at the oops_in_progress
definition site.
Also, your test of <= seems inappropriate. 99% of sites treat it as a
boolean. </nit>
> @@ -483,7 +638,11 @@ void vpanic(const char *fmt, va_list args)
> /*
> * Avoid nested stack-dumping if a panic occurs during oops processing
> */
> - if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
> + if (atomic_read(&panic_redirect_cpu) != PANIC_CPU_INVALID &&
> + panic_force_cpu == raw_smp_processor_id()) {
> + pr_emerg("panic: Redirected from CPU %d, skipping stack dump.\n",
> + atomic_read(&panic_redirect_cpu));
No stack dump because it's the wrong stack, right? Users might wonder
where their stack dump went.
> + } else if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
> panic_this_cpu_backtrace_printed = true;
> } else if (IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE)) {
> dump_stack();
Anyway, Looks Nice To Me. I'll queue it in mm.git's non-mm branches
and shall probably upstream it for 6.19, but additional review is
sought, please.
Powered by blists - more mailing lists