lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <MRWPR09MB8022ABA4B4D7D6805138789E8F84A@MRWPR09MB8022.eurprd09.prod.outlook.com>
Date: Wed, 7 Jan 2026 21:27:05 +0000
From: Pnina Feder <PNINA.FEDER@...ileye.com>
To: Petr Mladek <pmladek@...e.com>
CC: "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"senozhatsky@...omium.org" <senozhatsky@...omium.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, kernel test
 robot <lkp@...el.com>, Thomas Gleixner <tglx@...utronix.de>, Peter Zijlstra
	<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, Steven Rostedt
	<rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>, Baoquan He
	<bhe@...hat.com>, Vladimir Kondratiev <Vladimir.Kondratiev@...ileye.com>
Subject: RE: [PATCH v3] panic: add panic_force_cpu= parameter to redirect
 panic to a specific CPU

Hi Petr,

> Adding more people into Cc. I doubt that smp_call_function_single() is reliable in panic().

You're right. I've switched to smp_call_function_single_async() which 
uses trylock semantics and won't block if the CSD is busy.

> On Mon 2026-01-05 10:18:08, Pnina Feder wrote:
> > Some platforms require panic handling to execute on a specific CPU for 
> > crash dump to work reliably. This can be due to firmware limitations, 
> > interrupt routing constraints, or platform-specific requirements where 
> > only a single CPU is able to safely enter the crash kernel.
> > 
> > Add support for redirecting panic execution to a designated CPU via a 
> > kernel command-line parameter. When the parameter is provided, the CPU 
> > that initially triggers panic forwards the panic context to the target 
> > CPU, which then proceeds with the normal panic and kexec flow.
> > 
> > If the specified CPU is invalid, offline, or a panic is already in 
> > progress on another CPU, the redirection is skipped and panic 
> > continues on the current CPU.
> > 
> > Changes since v2:
> >  - Make panic redirection warnings generic and platform-agnostic
> > 
> > Reported-by: kernel test robot <lkp@...el.com>
> > Closes: 
> > https://lore.kernel.org/oe-kbuild-all/202601041820.6M8cIq2e-lkp@intel.
> > com/
> > Signed-off-by: Pnina Feder <pnina.feder@...ileye.com>
> > ---
> >  kernel/panic.c | 93 
> > ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 93 insertions(+)
> > 
> > diff --git a/kernel/panic.c b/kernel/panic.c index 
> > 0d52210a9e2b..6239bcdc2463 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -300,6 +300,92 @@ void __weak crash_smp_send_stop(void)
> >  
> >  atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
> >  
> > +#ifdef CONFIG_SMP
> > +/* CPU to redirect panic to, -1 means feature is disabled */ static 
> > +int panic_force_cpu = -1;
> > +
> > +static int __init panic_force_cpu_setup(char *str) {
> > +	int cpu;
> > +
> > +	if (!str)
> > +		return -EINVAL;
> > +
> > +	if (kstrtoint(str, 0, &cpu) || cpu < 0) {
> > +		pr_warn("panic_force_cpu: invalid value '%s'\n", str);
> > +		return -EINVAL;
> > +	}
> > +
> > +	panic_force_cpu = cpu;
> > +	pr_info("panic_force_cpu: panic will execute on CPU %d\n", cpu);
> > +	return 0;
> > +}
> > +early_param("panic_force_cpu", panic_force_cpu_setup);
> > +
> > +static void do_panic_on_target_cpu(void *info) {
> > +	panic("%s", (char *)info);
> > +}
> > +
> > +/**
> > + * panic_force_target_cpu - Redirect panic to a specific CPU for 
> > +crash kernel
> > + * @fmt: panic message format string
> > + * @args: arguments for format string
> > + *
> > + * Some platforms require panic handling to occur on a specific CPU
> > + * for the crash kernel to function correctly. This function 
> > +redirects
> > + * panic handling to the CPU specified via the panic_redirect_cpu= boot parameter.
> > + *
> > + * Returns true if panic should proceed on current CPU.
> > + * Returns false (never returns) if panic was redirected.
> > + */
> > +__printf(1, 0)
> > +static bool panic_force_target_cpu(const char *fmt, va_list args) {
> > +	static char panic_redirect_msg[1024];
> > +	int cpu = raw_smp_processor_id();
> > +	int target_cpu = panic_force_cpu;
> > +
> > +	/* Feature not enabled via boot parameter */
> > +	if (target_cpu < 0)
> > +		return true;
> > +
> > +	/* Already on target CPU - proceed normally */
> > +	if (cpu == target_cpu)
> > +		return true;
> > +
> > +	/* Target CPU is offline, can't redirect */
> > +	if (!cpu_online(target_cpu)) {
> > +		pr_warn("panic: target CPU %d is offline, proceeding on CPU %d.\n"
> > +			"Crash kernel interrupts may be unavailable.\n", target_cpu, cpu);
> > +		return true;
> > +	}
> > +
> > +	/* Another panic already in progress */
> > +	if (panic_in_progress()) {
> > +		pr_warn("panic: Another panic in progress on CPU %d, cannot redirect to CPU %d.\n"
> 
> IMHO, this message does not add much value and should be omitted.
> vpanic() does not print any message when panic_try_start() fails.

Agreed, removed in v4.

> > +			"Crash kernel interrupts may be unavailable.\n",
> 
> I am confused by this message.

Removed along with other verbose warnings.

> > +			atomic_read(&panic_cpu), target_cpu);
> > +		return true;
> > +	}
> > +
> > +	pr_info("panic: Redirecting from CPU %d to CPU %d for crash kernel\n",
> > +		cpu, target_cpu);
> > +
> > +	vsnprintf(panic_redirect_msg, sizeof(panic_redirect_msg), fmt, 
> > +args);
> 
> The "redirect" in "panic_redirect_msg" is confusing. It has nothing to do with redirection. It is just a buffer for the formatted panic message. I would call it "buf" or "panic_msg".

I'm now reusing vpanic()'s existing static buf[1024], passing it as a 
parameter. This also addresses Andrew's concern about the extra 1k 
memory allocation.

> > +	smp_call_function_single(target_cpu, do_panic_on_target_cpu, 
> > +panic_redirect_msg, false);
> 
> I doubt that this is safe and reliable in panic() context.
> For a start, panic() might be called in NMI and this function takes csd_lock().
> 
> panic() code should avoid locks. Or is should use trylock and handle a failure gracefully.

Changed to smp_call_function_single_async() which returns error instead 
of blocking if the CSD is busy. On failure, panic continues locally.

> BTW: The commit message says that this is needed for crash-dump
>      to work reliably. So, it does not make sense to do this
>      when crash-dump is not configured.

Good point. I've added CONFIG_CRASH_DUMP to the #ifdef in v4.

>      Maybe, crash-dump should get fixed instead?
>      What are the exact problems with the crash-dump, please?

On some platforms, the interrupt controller configuration for the crash 
kernel depends on which CPU executes the panic path. By the time the 
crash kernel starts, all secondary CPUs have already been stopped via 
smp_send_stop(), so it's too late to fix the routing at that point.

Thanks for the review!

Pnina

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ