linux-kernel - Re: [PATCH v1 1/3] kernel/panic: generalize panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCNHzXkz4wfnIDPM@U-2FWC9VHC-2323.local>
Date: Tue, 13 May 2025 21:23:25 +0800
From: Feng Tang <feng.tang@...ux.alibaba.com>
To: Petr Mladek <pmladek@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Lance Yang <lance.yang@...ux.dev>, linux-kernel@...r.kernel.org,
	mhiramat@...nel.org, llong@...hat.com
Subject: Re: [PATCH v1 1/3] kernel/panic: generalize panic_print's function
 to show sys info

Hi Petr,

Thanks for the review!

On Tue, May 13, 2025 at 11:57:19AM +0200, Petr Mladek wrote:
> On Sun 2025-05-11 16:52:52, Feng Tang wrote:
> > panic_print was introduced to help debugging kernel panic by dumping
> > different kinds of system information like tasks' call stack, memory,
> > ftrace buffer etc. Acutually this function could help debugging cases
> > like task-hung, soft/hard lockup too, where user may need the snapshot
> > of system info at that time.
> > 
> > Extract sys_show_info() function out to be used by other kernel parts
> > for debugging.
> > 
> > --- a/include/linux/panic.h
> > +++ b/include/linux/panic.h
> > @@ -16,6 +16,17 @@ extern void oops_enter(void);
> >  extern void oops_exit(void);
> >  extern bool oops_may_print(void);
> >  
> > +#define SYS_PRINT_TASK_INFO		0x00000001
> > +#define SYS_PRINT_MEM_INFO		0x00000002
> > +#define SYS_PRINT_TIMER_INFO		0x00000004
> > +#define SYS_PRINT_LOCK_INFO		0x00000008
> > +#define SYS_PRINT_FTRACE_INFO		0x00000010
> > +#define SYS_PRINT_ALL_PRINTK_MSG	0x00000020
> 
> Please, remove this option from the generic interface. It is
> controversial. In the current form, it makes sense to replay
> the log only in panic() after all the other actions:
> 
> 	console_verbose();
> 	bust_spinlocks(1);
> 	panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
> 	printk_legacy_allow_panic_sync();
> 	console_unblank();
> 	debug_locks_off();
> 	console_flush_on_panic(CONSOLE_FLUSH_PENDING);

OK, will do.

In my RFC version, I did reserved it for panic use only :) and
added the support to general case in v1 after some hesitation. 

https://lore.kernel.org/lkml/20250507104322.30700-2-feng.tang@linux.alibaba.com/

> 
> All these operations have some (side-)effect which increases
> the chance to actually see the messages on the console.
> 
> I think that it was primary meant for graphics consoles. But there
> it would need to be used together with printk_delay
> (/proc/sys/kernel/printk_delay, sysctl printk_delay).
> 
> Note that it creates a hairy code. It is the only reason why we need to
> call panic_print_sys_info() twice with "false" and "true"
> parameter.

Yes.

> 
> That said, I could imagine a generic use, for example, after forcing
> ignore_loglevel or so. Otherwise, I do not see any advantage in
> flushing the same messages again, for example, on serial or network
> consoles where they are likely already stored. We could add this
> later when there is a real-life demand.

You are right, one main usage is for some product which has only graphics
console and no serial one. And we also used for serial console sometimes,
when the console have a lot of user space message mixed with kernel ones,
and we'd like to see the full clean kernel messages.

> 
> > +#define SYS_PRINT_ALL_CPU_BT		0x00000040
> > +#define SYS_PRINT_BLOCKED_TASKS		0x00000080
> 
> The generic approach might deserve a separate source file,
> for example:
> 
>     include/linux/sys_info.h
>     lib/sys_info.c

Thanks for the suggestion! I'm really not good at naming. 

As panic.c is always built-in, I'll put sys_info.c as obj-y.

> Also I always considered the bitmask as a horrible user interface.
> It might be used internally. But it would be better to allow a human
> readable parameter, for example:
> 
> 	panic_sys_info=task,mem,timer,lock,ftrace,bt,all_bt,blocked_tasks

Yes, it's convenient for developers, as a cmdline parameter being parsed
at boot time.

But I think bitmask may be easier for runtime changing as a core parameter
under /proc/ or /sys/, or from sysctl interface. There are also some other
modules use debug bitmask controlling printking info for different
sub-components.

And we have similar control knobs for hung, lockup etc.

Or should I change the name from 'xxx_print_mask' to 'xxx_sysinfo_flag'
in patch 2/3 ?

> 
> The console reply might be handled by a separate:
> 
> 	panic_console_reply=1
> 
> And it would obsolete the existing "panic_print" which is an
> ugly name and interface from my POV.

Agree it's ugly :). But besides a kernel parameter,  'panic_print' is
also a sysctl interface, I'm afraid renaming it might break user ABI.

> > +extern void sys_show_info(unsigned long info_mask);
> > +
> >  extern bool panic_triggering_all_cpu_backtrace;
> >  extern int panic_timeout;
> >  extern unsigned long panic_print;
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -208,33 +200,44 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
> >  }
> >  EXPORT_SYMBOL(nmi_panic);
> >  
> > -static void panic_print_sys_info(bool console_flush)
> > +void sys_show_info(unsigned long info_mask)
> >  {
> > -	if (console_flush) {
> > -		if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> > -			console_flush_on_panic(CONSOLE_REPLAY_ALL);
> > -		return;
> > -	}
> > -
> > -	if (panic_print & PANIC_PRINT_TASK_INFO)
> > +	if (info_mask & SYS_PRINT_TASK_INFO)
> >  		show_state();
> >  
> > -	if (panic_print & PANIC_PRINT_MEM_INFO)
> > +	if (info_mask & SYS_PRINT_MEM_INFO)
> >  		show_mem();
> >  
> > -	if (panic_print & PANIC_PRINT_TIMER_INFO)
> > +	if (info_mask & SYS_PRINT_TIMER_INFO)
> >  		sysrq_timer_list_show();
> >  
> > -	if (panic_print & PANIC_PRINT_LOCK_INFO)
> > +	if (info_mask & SYS_PRINT_LOCK_INFO)
> >  		debug_show_all_locks();
> >  
> > -	if (panic_print & PANIC_PRINT_FTRACE_INFO)
> > +	if (info_mask & SYS_PRINT_FTRACE_INFO)
> >  		ftrace_dump(DUMP_ALL);
> >  
> > -	if (panic_print & PANIC_PRINT_BLOCKED_TASKS)
> > +	if (panic_print & SYS_PRINT_ALL_PRINTK_MSG)
> > +		console_flush_on_panic(CONSOLE_REPLAY_ALL);
> 
> I do not see any advantage in replaying the log at this stage.
> It might make sense to replay the messages printed before
> the critical situation. But it makes sense only when they
> might be filtered/blocked before and can be seen now (unblanked
> consoles, forced  ignore_loglevel, or so).
> 
> I would keep this in the bitmap for backward compactibility.
> But I would ignore it my the generic print_sys_info().

OK.

> And I would replace panic_print_sys_info(true) call with:
> 
> static void panic_console_replay(void)
> {
> 	if (panic_print & SYS_PRINT_ALL_PRINTK_MSG)
> 		console_flush_on_panic(CONSOLE_REPLAY_ALL);
> }

Nice cleanup! Will do. I'll fold this change with this patch.

Thanks,
Feng

> 
> > +	if (info_mask & SYS_PRINT_ALL_CPU_BT)
> > +		trigger_all_cpu_backtrace();
> > +
> > +	if (info_mask & SYS_PRINT_BLOCKED_TASKS)
> >  		show_state_filter(TASK_UNINTERRUPTIBLE);
> >  }
> 
> Best Regards,
> Petr