linux-kernel - Re: [PATCH printk v2 21/26] printk: Coordinate direct printing in panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZeHSgZs9I3Ihvpye@alley>
Date: Fri, 1 Mar 2024 14:05:05 +0100
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Josh Poimboeuf <jpoimboe@...nel.org>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>,
	Uros Bizjak <ubizjak@...il.com>,
	"Guilherme G. Piccoli" <gpiccoli@...lia.com>,
	Kefeng Wang <wangkefeng.wang@...wei.com>,
	Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH printk v2 21/26] printk: Coordinate direct printing in
 panic

On Sun 2024-02-18 20:03:21, John Ogness wrote:
> Perform printing by nbcon consoles on the panic CPU from the
> printk() caller context in order to get panic messages printed
> as soon as possible.
> 
> If legacy and nbcon consoles are registered, the legacy consoles
> will no longer perform direct printing on the panic CPU until
> after the backtrace has been stored. This will give the safe
> nbcon consoles a chance to print the panic messages before
> allowing the unsafe legacy consoles to print.
> 
> If no nbcon consoles are registered, there is no change in
> behavior (i.e. legacy consoles will always attempt to print
> from the printk() caller context).

> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -370,6 +370,8 @@ void panic(const char *fmt, ...)
>  	 */
>  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>  
> +	printk_legacy_allow_panic_sync();

I would call this before the panic notifiers. They are known
to cause problems.

It was the reason to introduce "crash_kexec_post_notifiers" parameter.
Also there is a patchset which tries to somehow split them
by purpose, see
https://lore.kernel.org/all/20220427224924.592546-23-gpiccoli@igalia.com/

It brings another question whether to try flushing the legacy consoles
before calling the notifiers.

>  	panic_print_sys_info(false);
>  
>  	kmsg_dump(KMSG_DUMP_PANIC);
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2329,12 +2329,23 @@ int vprintk_store(int facility, int level,
>  	return ret;
>  }
>  
> +static bool legacy_allow_panic_sync;
> +
> +/*
> + * This acts as a one-way switch to allow legacy consoles to print from
> + * the printk() caller context on a panic CPU.
> + */
> +void printk_legacy_allow_panic_sync(void)
> +{
> +	legacy_allow_panic_sync = true;

I would flush the legacy consoles here. Otherwise it might be done
by another random printk() from the notifiers or by the even more
unsafe printk_console_flush_in_panic().

I mean to do something like:

	if (printing_via_unlock && console_trylock)
		console_unlock();

> +}
> +
>  asmlinkage int vprintk_emit(int facility, int level,
>  			    const struct dev_printk_info *dev_info,
>  			    const char *fmt, va_list args)
>  {
> +	bool do_trylock_unlock = printing_via_unlock;
>  	int printed_len;
> -	bool in_sched = false;
>  
>  	/* Suppress unimportant messages after panic happens */
>  	if (unlikely(suppress_printk))
> @@ -2350,15 +2361,43 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
> -		in_sched = true;
> +		/* If called from the scheduler, we can not call up(). */
> +		do_trylock_unlock = false;
>  	}
>  
>  	printk_delay(level);
>  
>  	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
>  
> -	/* If called from the scheduler, we can not call up(). */
> -	if (!in_sched && printing_via_unlock) {
> +	if (!have_boot_console && have_nbcon_console) {

Nit: The opposite order is more logic ;-)

	if (have_nbcon_console && !have_boot_console) {

> +		bool is_panic_context = this_cpu_in_panic();
> +
> +		/*
> +		 * In panic, the legacy consoles are not allowed to print from
> +		 * the printk calling context unless explicitly allowed. This
> +		 * gives the safe nbcon consoles a chance to print out all the
> +		 * panic messages first. This restriction only applies if
> +		 * there are nbcon consoles registered.
> +		 */
> +		if (is_panic_context)
> +			do_trylock_unlock &= legacy_allow_panic_sync;
> +
> +		/*
> +		 * There are situations where nbcon atomic printing should
> +		 * happen in the printk() caller context:
> +		 *
> +		 * - When this CPU is in panic.
> +		 *
> +		 * Note that if boot consoles are registered, the
> +		 * console_lock/console_unlock dance must be relied upon
> +		 * instead because nbcon consoles cannot print simultaneously
> +		 * with boot consoles.
> +		 */
> +		if (is_panic_context)
> +			nbcon_atomic_flush_all();
> +	}
> +
> +	if (do_trylock_unlock) {
>  		/*
>  		 * The caller may be holding system-critical or
>  		 * timing-sensitive locks. Disable preemption during

Otherwise, it looks good.

Best Regards,
Petr