linux-kernel - Re: [PATCH v6] panic: add panic_force_cpu= parameter to redirect panic to a specific CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWUgPIr3-d5v3km-@pathway.suse.cz>
Date: Mon, 12 Jan 2026 17:24:28 +0100
From: Petr Mladek <pmladek@...e.com>
To: Pnina Feder <pnina.feder@...ileye.com>
Cc: akpm@...ux-foundation.org, bhe@...hat.com, linux-kernel@...r.kernel.org,
	lkp@...el.com, mgorman@...e.de, mingo@...hat.com,
	peterz@...radead.org, rostedt@...dmis.org, senozhatsky@...omium.org,
	tglx@...utronix.de, vkondra@...ileye.com
Subject: Re: [PATCH v6] panic: add panic_force_cpu= parameter to redirect
 panic to a specific CPU

On Sun 2026-01-11 14:36:56, Pnina Feder wrote:
> Some platforms require panic handling to execute on a specific CPU for
> crash dump to work reliably. This can be due to firmware limitations,
> interrupt routing constraints, or platform-specific requirements where
> only a single CPU is able to safely enter the crash kernel.
> 
> Add the panic_force_cpu= kernel command-line parameter to redirect panic
> execution to a designated CPU. When the parameter is provided, the CPU
> that initially triggers panic forwards the panic context to the target
> CPU via IPI, which then proceeds with the normal panic and kexec flow.
> 
> The IPI delivery is implemented as a weak function (panic_smp_redirect_cpu)
> so architectures with NMI support can override it for more reliable delivery.
> 
> If the specified CPU is invalid, offline, or a panic is already in
> progress on another CPU, the redirection is skipped and panic continues
> on the current CPU.
> 
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -300,6 +300,121 @@ void __weak crash_smp_send_stop(void)
>  
>  atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
>  
> +#if defined(CONFIG_SMP) && defined(CONFIG_CRASH_DUMP)
> +/* CPU to redirect panic to, or -1 if disabled */
> +static int panic_force_cpu = -1;
> +
> +static int __init panic_force_cpu_setup(char *str)
> +{
> +	int cpu;
> +
> +	if (!str)
> +		return -EINVAL;
> +
> +	if (kstrtoint(str, 0, &cpu) || cpu < 0) {
> +		pr_warn("panic_force_cpu: invalid value '%s'\n", str);
> +		return -EINVAL;
> +	}
> +
> +	panic_force_cpu = cpu;
> +	return 0;
> +}
> +early_param("panic_force_cpu", panic_force_cpu_setup);
> +
> +static void do_panic_on_target_cpu(void *info)
> +{
> +	panic("%s", (char *)info);
> +}
> +
> +/**
> + * panic_smp_redirect_cpu - Redirect panic to target CPU
> + * @target_cpu: CPU that should handle the panic
> + * @msg: formatted panic message
> + *
> + * Default implementation uses IPI. Architectures with NMI support
> + * can override this for more reliable delivery.
> + *
> + * Return: 0 on success, negative errno on failure
> + */
> +int __weak panic_smp_redirect_cpu(int target_cpu, void *msg)
> +{
> +	static call_single_data_t panic_csd;
> +
> +	panic_csd.func = do_panic_on_target_cpu;
> +	panic_csd.info = msg;
> +
> +	return smp_call_function_single_async(target_cpu, &panic_csd);
> +}
> +
> +/**
> + * panic_force_target_cpu - Redirect panic to a specific CPU for crash kernel
> + * @buf: buffer to format the panic message into
> + * @buf_size: size of the buffer
> + * @fmt: panic message format string
> + * @args: arguments for format string
> + *
> + * Some platforms require panic handling to occur on a specific CPU
> + * for the crash kernel to function correctly. This function redirects
> + * panic handling to the CPU specified via the panic_redirect_cpu= boot parameter.
> + *
> + * Returns true if panic should proceed on current CPU.
> + * Returns false (never returns) if panic was redirected.
> + */
> +__printf(3, 0)
> +static bool panic_force_target_cpu(char *buf, int buf_size, const char *fmt, va_list args)
> +{
> +	int cpu = raw_smp_processor_id();
> +	int target_cpu = panic_force_cpu;

What is the reason to read the value into a local variable?
If the reason was to avoid a race then READ_ONCE() should be used.
Otherwise, it looks a bit misleading to use so different names.

Maybe, rename the function to panic_try_force_cpu() use
the global variable.

Also, please invert the logic. The function should return
"false" when it was not redirected (logical failure).

> +	/* Feature not enabled via boot parameter */
> +	if (target_cpu < 0)
> +		return true;
> +
> +	/* Already on target CPU - proceed normally */
> +	if (cpu == target_cpu)
> +		return true;
> +
> +	/* Target CPU is offline, can't redirect */
> +	if (!cpu_online(target_cpu))
> +		return true;
> +
> +	/* Another panic already in progress */
> +	if (panic_in_progress())
> +		return true;
> +
> +	vsnprintf(buf, buf_size, fmt, args);

This is using a global buffer without any serialization.
More CPUs might call panic()/panic_force_target_cpu() in parallel.
The buffer might contain a mess as a result.

I am afraid that we need a separate buffer. And only one
CPU can be allowed to use it. We would need similar synchronization
as with @panic_cpu for @panic_redirect_cpu.

> +
> +	console_verbose();
> +	bust_spinlocks(1);
> +
> +	pr_emerg("panic: Redirecting from CPU %d to CPU %d for crash kernel\n",
> +		cpu, target_cpu);
> +
> +	/* Dump original CPU's stack before redirecting */
> +	if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
> +		panic_this_cpu_backtrace_printed = true;
> +	} else if (IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE)) {
> +		dump_stack();
> +		panic_this_cpu_backtrace_printed = true;
> +	}

The "panic_this_cpu_backtrace_printed" variable is checked
in panic_trigger_all_cpu_backtrace() to see whether we
want to print backtrace for this CPU or not.

panic_smp_redirect_cpu() is going to call panic() on another CPU.
Do we want to print backtrace from the other CPU? I guess, not.

We should make the other panic() aware that it was redirected
from here. Maybe, using the @panic_redirect_cpu variable which
I suggested above to synchronize the access to the helper buffer.

And panic() should do something like:

	if (panic_redirect_cpu >= 0 &&
	    panic_force_cpu == raw_smp_processor_id()) {
		/* Backtrace was printed on the original CPU. */
		pr_emerg("panic: Redirected from CPU %d to CPU %d\n",
			 panic_redirect_cpu, panic_force_cpu);
	} else if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
		panic_this_cpu_backtrace_printed = true;
	} else if (IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE)) {
		dump_stack();
		panic_this_cpu_backtrace_printed = true;
	}

Also we might need to check @panic_regirect_cpu in
panic_trigger_all_cpu_backtrace() and skip this particular CPU there.

> +
> +	printk_legacy_allow_panic_sync();
> +	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +
> +	if (panic_smp_redirect_cpu(target_cpu, buf) != 0)
> +		return true;
> +
> +	/* IPI/NMI sent, this CPU should stop */
> +	return false;
> +}
> +#else
> +__printf(3, 0)
> +static inline bool panic_force_target_cpu(char *buf, int buf_size, const char *fmt, va_list args)
> +{
> +	return true;
> +}
> +#endif /* CONFIG_SMP && CONFIG_CRASH_DUMP */
> +
>  bool panic_try_start(void)
>  {
>  	int old_cpu, this_cpu;
> @@ -451,6 +566,13 @@ void vpanic(const char *fmt, va_list args)
>  	local_irq_disable();
>  	preempt_disable_notrace();
>  
> +	/*
> +	 * Redirect panic to target CPU if configured via panic_force_cpu=.
> +	 * Returns false and never returns if panic was redirected.

The 2nd sentence is confusing. IMHO, panic_smp_self_stop() always
returns.

The point is that this CPU should stop itself when panic() was redirected.

But wait!

The panic_cpu will eventually do smp_send_stop(). On x86_64, it would
call native_stop_other_cpus(). It woult wait until this CPU
clears the related bit in cpus_stop_mask(). But it would never
when when this CPU already spins in panic_smp_self_stop().
Or do I miss anything, please?

IMHO, panic_smp_self_stop() can't be used here. Or we need
to make stop_other_cpus() aware that this one is already
stopped.

Sigh, it is getting complicated.

> +	 */
> +	if (!panic_force_target_cpu(buf, sizeof(buf), fmt, args))
> +		panic_smp_self_stop();
> +
>  	/*
>  	 * It's possible to come here directly from a panic-assertion and
>  	 * not have preempt disabled. Some functions called from here want

Best Regards,
Petr