linux-kernel - RE: [PATCH v6] panic: add panic_force_cpu= parameter to redirect panic to a specific CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <MRWPR09MB8022CE035AF04F0905DD65808F8CA@MRWPR09MB8022.eurprd09.prod.outlook.com>
Date: Thu, 15 Jan 2026 13:06:57 +0000
From: Pnina Feder <PNINA.FEDER@...ileye.com>
To: Petr Mladek <pmladek@...e.com>
CC: "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "bhe@...hat.com"
	<bhe@...hat.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "lkp@...el.com" <lkp@...el.com>,
	"mgorman@...e.de" <mgorman@...e.de>, "mingo@...hat.com" <mingo@...hat.com>,
	"peterz@...radead.org" <peterz@...radead.org>, "rostedt@...dmis.org"
	<rostedt@...dmis.org>, "senozhatsky@...omium.org" <senozhatsky@...omium.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>, Vladimir Kondratiev
	<Vladimir.Kondratiev@...ileye.com>
Subject: RE: [PATCH v6] panic: add panic_force_cpu= parameter to redirect
 panic to a specific CPU

Hi Petr,

Thank you for the careful review.

> > +	int target_cpu = panic_force_cpu;
> 
> What is the reason to read the value into a local variable?
> If the reason was to avoid a race then READ_ONCE() should be used.
> Otherwise, it looks a bit misleading to use so different names.
> 
> Maybe, rename the function to panic_try_force_cpu() use the global variable.
> 
> Also, please invert the logic. The function should return "false" when it was not redirected (logical failure).
> 

Fixed.

> > +	/* Feature not enabled via boot parameter */
> > +	if (target_cpu < 0)
> > +		return true;
> > +
> > +	/* Already on target CPU - proceed normally */
> > +	if (cpu == target_cpu)
> > +		return true;
> > +
> > +	/* Target CPU is offline, can't redirect */
> > +	if (!cpu_online(target_cpu))
> > +		return true;
> > +
> > +	/* Another panic already in progress */
> > +	if (panic_in_progress())
> > +		return true;
> > +
> > +	vsnprintf(buf, buf_size, fmt, args);
> 
> This is using a global buffer without any serialization.
> More CPUs might call panic()/panic_force_target_cpu() in parallel.
> The buffer might contain a mess as a result.
> 
> I am afraid that we need a separate buffer. And only one CPU can be allowed to use it. We would need similar synchronization as with @panic_cpu for @panic_redirect_cpu.

Thanks, you are right, added this in v7.

> > +
> > +	console_verbose();
> > +	bust_spinlocks(1);
> > +
> > +	pr_emerg("panic: Redirecting from CPU %d to CPU %d for crash kernel\n",
> > +		cpu, target_cpu);
> > +
> > +	/* Dump original CPU's stack before redirecting */
> > +	if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
> > +		panic_this_cpu_backtrace_printed = true;
> > +	} else if (IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE)) {
> > +		dump_stack();
> > +		panic_this_cpu_backtrace_printed = true;
> > +	}
> 
> The "panic_this_cpu_backtrace_printed" variable is checked in panic_trigger_all_cpu_backtrace() to see whether we want to print backtrace for this CPU or not.

> panic_smp_redirect_cpu() is going to call panic() on another CPU.
> Do we want to print backtrace from the other CPU? I guess, not.
> 
> We should make the other panic() aware that it was redirected from here. Maybe, using the @panic_redirect_cpu variable which I suggested above to synchronize the access to the helper buffer.

> And panic() should do something like:
> 
> 	if (panic_redirect_cpu >= 0 &&
> 	    panic_force_cpu == raw_smp_processor_id()) {
> 		/* Backtrace was printed on the original CPU. */
> 		pr_emerg("panic: Redirected from CPU %d to CPU %d\n",
> 			 panic_redirect_cpu, panic_force_cpu);
> 	} else if (test_taint(TAINT_DIE) || oops_in_progress > 1) {
> 		panic_this_cpu_backtrace_printed = true;
> 	} else if (IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE)) {
> 		dump_stack();
> 		panic_this_cpu_backtrace_printed = true;
> 	}

Done, I've implemented this using panic_redirect_cpu (as an atomic, similar to panic_cpu) to track the original CPU.
So we only get the stack dump from the original (crashed) CPU, and if panic_trigger_all_cpu_backtrace() is triggered,
the target cpu's stack will be printed along with the others.

However, I'm wondering if we might also want to print the stack dump of the target CPU in case something goes wrong during the redirection?
It could provide useful debugging information. What do you think?

> Also we might need to check @panic_regirect_cpu in
> panic_trigger_all_cpu_backtrace() and skip this particular CPU there.

Since we now call set_cpu_online(smp_processor_id(), false) on the original CPU before it enters panic_smp_self_stop(), 
panic_trigger_all_cpu_backtrace() will not attempt to print its backtrace.

> > +
> > +	printk_legacy_allow_panic_sync();
> > +	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > +
> > +	if (panic_smp_redirect_cpu(target_cpu, buf) != 0)
> > +		return true;
> > +
> > +	/* IPI/NMI sent, this CPU should stop */
> > +	return false;
> > +}
> > +#else
> > +__printf(3, 0)
> > +static inline bool panic_force_target_cpu(char *buf, int buf_size, 
> > +const char *fmt, va_list args) {
> > +	return true;
> > +}
> > +#endif /* CONFIG_SMP && CONFIG_CRASH_DUMP */
> > +
> >  bool panic_try_start(void)
> >  {
> >  	int old_cpu, this_cpu;
> > @@ -451,6 +566,13 @@ void vpanic(const char *fmt, va_list args)
> >  	local_irq_disable();
> >  	preempt_disable_notrace();
> >  
> > +	/*
> > +	 * Redirect panic to target CPU if configured via panic_force_cpu=.
> > +	 * Returns false and never returns if panic was redirected.
> 
> The 2nd sentence is confusing. IMHO, panic_smp_self_stop() always returns.

Fixed.

> The point is that this CPU should stop itself when panic() was redirected.
> 
> But wait!
> 
> The panic_cpu will eventually do smp_send_stop(). On x86_64, it would call native_stop_other_cpus(). It woult wait until this CPU clears the related bit in cpus_stop_mask(). But it would never when when this CPU already spins in panic_smp_self_stop().
Or do I miss anything, please?

> IMHO, panic_smp_self_stop() can't be used here. Or we need to make stop_other_cpus() aware that this one is already stopped.
> 
> Sigh, it is getting complicated.

I've added set_cpu_online(smp_processor_id(), false) before calling panic_smp_self_stop().
This should handle most architectures, as they check the online CPU mask in their stop implementation.
This is also consistent with what arm64 does in its panic_smp_self_stop() implementation.

For x86_64, you're right that it uses its own cpus_stop_mask rather than the online mask.
However, I noticed that the existing code in vpanic() already calls panic_smp_self_stop() for parallel panic cases:

        if (panic_try_start()) {
                /* go ahead */
        }else if (panic_on_other_cpu())        
                panic_smp_self_stop();

So this situation can already occur today.
Looking at native_stop_other_cpus(), it always runs with wait=0 when called from the panic path,
so it won't deadlock, just timeout after about 1 second.

Do you think we need additional handling for x86_64,
or is the current timeout behavior acceptable given that it already exists in the parallel panic case?

Thanks,
Pnina