[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<MRWPR09MB8022ABA4B4D7D6805138789E8F84A@MRWPR09MB8022.eurprd09.prod.outlook.com>
Date: Wed, 7 Jan 2026 21:27:05 +0000
From: Pnina Feder <PNINA.FEDER@...ileye.com>
To: Petr Mladek <pmladek@...e.com>
CC: "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"senozhatsky@...omium.org" <senozhatsky@...omium.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, kernel test
robot <lkp@...el.com>, Thomas Gleixner <tglx@...utronix.de>, Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, Steven Rostedt
<rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>, Baoquan He
<bhe@...hat.com>, Vladimir Kondratiev <Vladimir.Kondratiev@...ileye.com>
Subject: RE: [PATCH v3] panic: add panic_force_cpu= parameter to redirect
panic to a specific CPU
Hi Petr,
> Adding more people into Cc. I doubt that smp_call_function_single() is reliable in panic().
You're right. I've switched to smp_call_function_single_async() which
uses trylock semantics and won't block if the CSD is busy.
> On Mon 2026-01-05 10:18:08, Pnina Feder wrote:
> > Some platforms require panic handling to execute on a specific CPU for
> > crash dump to work reliably. This can be due to firmware limitations,
> > interrupt routing constraints, or platform-specific requirements where
> > only a single CPU is able to safely enter the crash kernel.
> >
> > Add support for redirecting panic execution to a designated CPU via a
> > kernel command-line parameter. When the parameter is provided, the CPU
> > that initially triggers panic forwards the panic context to the target
> > CPU, which then proceeds with the normal panic and kexec flow.
> >
> > If the specified CPU is invalid, offline, or a panic is already in
> > progress on another CPU, the redirection is skipped and panic
> > continues on the current CPU.
> >
> > Changes since v2:
> > - Make panic redirection warnings generic and platform-agnostic
> >
> > Reported-by: kernel test robot <lkp@...el.com>
> > Closes:
> > https://lore.kernel.org/oe-kbuild-all/202601041820.6M8cIq2e-lkp@intel.
> > com/
> > Signed-off-by: Pnina Feder <pnina.feder@...ileye.com>
> > ---
> > kernel/panic.c | 93
> > ++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 93 insertions(+)
> >
> > diff --git a/kernel/panic.c b/kernel/panic.c index
> > 0d52210a9e2b..6239bcdc2463 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -300,6 +300,92 @@ void __weak crash_smp_send_stop(void)
> >
> > atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
> >
> > +#ifdef CONFIG_SMP
> > +/* CPU to redirect panic to, -1 means feature is disabled */ static
> > +int panic_force_cpu = -1;
> > +
> > +static int __init panic_force_cpu_setup(char *str) {
> > + int cpu;
> > +
> > + if (!str)
> > + return -EINVAL;
> > +
> > + if (kstrtoint(str, 0, &cpu) || cpu < 0) {
> > + pr_warn("panic_force_cpu: invalid value '%s'\n", str);
> > + return -EINVAL;
> > + }
> > +
> > + panic_force_cpu = cpu;
> > + pr_info("panic_force_cpu: panic will execute on CPU %d\n", cpu);
> > + return 0;
> > +}
> > +early_param("panic_force_cpu", panic_force_cpu_setup);
> > +
> > +static void do_panic_on_target_cpu(void *info) {
> > + panic("%s", (char *)info);
> > +}
> > +
> > +/**
> > + * panic_force_target_cpu - Redirect panic to a specific CPU for
> > +crash kernel
> > + * @fmt: panic message format string
> > + * @args: arguments for format string
> > + *
> > + * Some platforms require panic handling to occur on a specific CPU
> > + * for the crash kernel to function correctly. This function
> > +redirects
> > + * panic handling to the CPU specified via the panic_redirect_cpu= boot parameter.
> > + *
> > + * Returns true if panic should proceed on current CPU.
> > + * Returns false (never returns) if panic was redirected.
> > + */
> > +__printf(1, 0)
> > +static bool panic_force_target_cpu(const char *fmt, va_list args) {
> > + static char panic_redirect_msg[1024];
> > + int cpu = raw_smp_processor_id();
> > + int target_cpu = panic_force_cpu;
> > +
> > + /* Feature not enabled via boot parameter */
> > + if (target_cpu < 0)
> > + return true;
> > +
> > + /* Already on target CPU - proceed normally */
> > + if (cpu == target_cpu)
> > + return true;
> > +
> > + /* Target CPU is offline, can't redirect */
> > + if (!cpu_online(target_cpu)) {
> > + pr_warn("panic: target CPU %d is offline, proceeding on CPU %d.\n"
> > + "Crash kernel interrupts may be unavailable.\n", target_cpu, cpu);
> > + return true;
> > + }
> > +
> > + /* Another panic already in progress */
> > + if (panic_in_progress()) {
> > + pr_warn("panic: Another panic in progress on CPU %d, cannot redirect to CPU %d.\n"
>
> IMHO, this message does not add much value and should be omitted.
> vpanic() does not print any message when panic_try_start() fails.
Agreed, removed in v4.
> > + "Crash kernel interrupts may be unavailable.\n",
>
> I am confused by this message.
Removed along with other verbose warnings.
> > + atomic_read(&panic_cpu), target_cpu);
> > + return true;
> > + }
> > +
> > + pr_info("panic: Redirecting from CPU %d to CPU %d for crash kernel\n",
> > + cpu, target_cpu);
> > +
> > + vsnprintf(panic_redirect_msg, sizeof(panic_redirect_msg), fmt,
> > +args);
>
> The "redirect" in "panic_redirect_msg" is confusing. It has nothing to do with redirection. It is just a buffer for the formatted panic message. I would call it "buf" or "panic_msg".
I'm now reusing vpanic()'s existing static buf[1024], passing it as a
parameter. This also addresses Andrew's concern about the extra 1k
memory allocation.
> > + smp_call_function_single(target_cpu, do_panic_on_target_cpu,
> > +panic_redirect_msg, false);
>
> I doubt that this is safe and reliable in panic() context.
> For a start, panic() might be called in NMI and this function takes csd_lock().
>
> panic() code should avoid locks. Or is should use trylock and handle a failure gracefully.
Changed to smp_call_function_single_async() which returns error instead
of blocking if the CSD is busy. On failure, panic continues locally.
> BTW: The commit message says that this is needed for crash-dump
> to work reliably. So, it does not make sense to do this
> when crash-dump is not configured.
Good point. I've added CONFIG_CRASH_DUMP to the #ifdef in v4.
> Maybe, crash-dump should get fixed instead?
> What are the exact problems with the crash-dump, please?
On some platforms, the interrupt controller configuration for the crash
kernel depends on which CPU executes the panic path. By the time the
crash kernel starts, all secondary CPUs have already been stopped via
smp_send_stop(), so it's too late to fix the routing at that point.
Thanks for the review!
Pnina
Powered by blists - more mailing lists