lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZqPF-GjhCUlR1fQv@pathway.suse.cz>
Date: Fri, 26 Jul 2024 17:51:20 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v3 07/19] printk: Add helpers for flush type logic

On Mon 2024-07-22 19:25:27, John Ogness wrote:
> There are many call sites where console flushing occur.
> Depending on the system state and types of consoles, the flush
> methods to use are different. A flush call site generally must
> consider:
> 
> 	@have_boot_console
> 	@have_nbcon_console
> 	@have_legacy_console
> 	@legacy_allow_panic_sync
> 	is_printk_preferred()
> 
> and take into account the current CPU state:
> 
> 	NBCON_PRIO_NORMAL
> 	NBCON_PRIO_EMERGENCY
> 	NBCON_PRIO_PANIC
> 
> in order to decide if it should:
> 
> 	flush nbcon directly via atomic_write() callback
> 	flush legacy directly via console_unlock
> 	flush legacy via offload to irq_work
> 
> All of these call sites use their own logic to make this
> decision, which is complicated and error prone. Especially
> later when two more flush methods will be introduced:
> 
> 	flush nbcon via offload to kthread
> 	flush legacy via offload to kthread
> 
> Introduce a new internal struct console_flush_type that
> specifies the flush method(s) that are available for a
> particular call site to use.
> 
> Introduce helper functions to fill out console_flush_type to
> be used for emergency and non-emergency call sites.
> 
> In many system states it is acceptable to flush legacy directly
> via console_unlock or via offload to irq_work. For this reason
> the non-emergency helper provides an argument @prefer_offload
> for the caller to specify which method it is interested in
> performing. (The helper functions will never allow both.)
> 
> Replace the logic of all flushing call sites to use the new
> helpers. Note that this cleans up various corner cases where
> is_printk_preferred() and @have_boot_console were not being
> considerered before.
> 
> Signed-off-by: John Ogness <john.ogness@...utronix.de>

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2330,9 +2331,13 @@ static bool legacy_allow_panic_sync;
>   */
>  void printk_legacy_allow_panic_sync(void)
>  {
> +	struct console_flush_type ft;
> +
>  	legacy_allow_panic_sync = true;
>  
> -	if (printing_via_unlock && !in_nmi()) {
> +	printk_get_console_flush_type(&ft, false);
> +
> +	if (ft.legacy_direct && !in_nmi()) {

in_nmi() check is not needed any longer. It is already done in
is_printk_deferred() in printk_get_console_flush_type().

>  		if (console_trylock())
>  			console_unlock();
>  	}
> @@ -2342,7 +2347,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  			    const struct dev_printk_info *dev_info,
>  			    const char *fmt, va_list args)
>  {
> -	bool do_trylock_unlock = printing_via_unlock;
> +	struct console_flush_type ft;
> +	bool defer_legacy = false;
>  	int printed_len;
>  
>  	/* Suppress unimportant messages after panic happens */
> @@ -2360,41 +2366,19 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
>  		/* If called from the scheduler, we can not call up(). */
> -		do_trylock_unlock = false;
> +		defer_legacy = true;
>  	}
>  
>  	printk_delay(level);
>  
>  	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
>  
> -	if (have_nbcon_console && !have_boot_console) {
> -		bool is_panic_context = this_cpu_in_panic();
> +	printk_get_console_flush_type(&ft, false);

We could pass "defer_legacy" here. It won't be needed to check it
later then.

> -		/*
> -		 * In panic, the legacy consoles are not allowed to print from
> -		 * the printk calling context unless explicitly allowed. This
> -		 * gives the safe nbcon consoles a chance to print out all the
> -		 * panic messages first. This restriction only applies if
> -		 * there are nbcon consoles registered.
> -		 */
> -		if (is_panic_context)
> -			do_trylock_unlock &= legacy_allow_panic_sync;
> +	if (ft.nbcon_atomic)
> +		nbcon_atomic_flush_pending();
>  
> -		/*
> -		 * There are situations where nbcon atomic printing should
> -		 * happen in the printk() caller context:
> -		 *
> -		 * - When this CPU is in panic.
> -		 *
> -		 * Note that if boot consoles are registered, the console
> -		 * lock/unlock dance must be relied upon instead because nbcon
> -		 * consoles cannot print simultaneously with boot consoles.
> -		 */
> -		if (is_panic_context)
> -			nbcon_atomic_flush_pending();

Just for record. If I get it correctly than this actually fixes a bug.
The original code called nbcon_atomic_flush_pending() only in panic().
The nbcon consoles were not flushed when there were only nbcon consoles
(printing_via_unlock == false).

I think that we did not notice it because it probably got fixed by
later patches adding the printk kthreads.

> -	}
> -
> -	if (do_trylock_unlock) {
> +	if (!defer_legacy && ft.legacy_direct) {

@defer_legacy should not be needed if we passed it to
printk_get_console_flush_type().

>  		/*
>  		 * The caller may be holding system-critical or
>  		 * timing-sensitive locks. Disable preemption during
> @@ -2409,22 +2393,17 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		 * semaphore. The release will print out buffers. With the
>  		 * spinning variant, this context tries to take over the
>  		 * printing from another printing context.
> -		 *
> -		 * Skip it in EMERGENCY priority. The console will be
> -		 * explicitly flushed when exiting the emergency section.
>  		 */
> -		if (nbcon_get_default_prio() != NBCON_PRIO_EMERGENCY) {
> -			if (console_trylock_spinning())
> -				console_unlock();
> -		}
> +		if (console_trylock_spinning())
> +			console_unlock();
>  
>  		preempt_enable();
>  	}
>  
> -	if (do_trylock_unlock)
> -		wake_up_klogd();
> -	else
> +	if ((defer_legacy && ft.legacy_direct) || ft.legacy_offload)

ditto.

>  		defer_console_output();
> +	else
> +		wake_up_klogd();
>  
>  	return printed_len;
>  }
> @@ -2728,10 +2707,15 @@ void resume_console(void)
>   */
>  static int console_cpu_notify(unsigned int cpu)
>  {
> -	if (!cpuhp_tasks_frozen && printing_via_unlock) {
> -		/* If trylock fails, someone else is doing the printing */
> -		if (console_trylock())
> -			console_unlock();
> +	struct console_flush_type ft;
> +
> +	if (!cpuhp_tasks_frozen) {
> +		printk_get_console_flush_type(&ft, false);
> +
> +		if (ft.legacy_direct) {
> +			if (console_trylock())
> +				console_unlock();
> +		}

One might be curious why we do not flush nbcon consoles here.
I guess that it will be less important after introducing
the kthreads.

Could it be called before the kthreads are started?

Anyway, this looks like a common pattern. Maybe, we could hide
it into some helper and be in the safe side:

/* Try to flush consoles directly when needed. */
void try_console_flush()
{
	struct console_flush_type ft;

	printk_get_console_flush_type(&ft, false);

	if (ft.nbcon_atomic)
		nbcon_atomic_flush_pending();

	if (ft.legacy_direct)
		console_flush_all(false, &next_seq, &handover);
}

>  	}
>  	return 0;
>  }

Otherwise, it looks good.

Heh, I wondered several times if it was worth it. The struct
console_flush_type and printk_get_console_flush_type()
sometimes looked like an overkill. But I see many advantages:

  + As the commit message says, the decision how to flush the messages
    depend on many variables. And it is nice to have it in one place.

  + The logic will get even more complicated after adding the
    kthreads.

  + printk_get_console_flush_type() allows to change the behavior
    in many situations in one place.

  + printk_get_console_flush_type() allows to easily find all
    locations where we decide how to flush the messages. It helps
    to check affected code paths.

So, I think that it is worth it in the end.

Note that I did not check the emergency code paths because
they are going to be reworked as discussed in the printk pull
request for 6.11, see
https://lore.kernel.org/r/ZqJKbcLgTeYRkDd6@pathway.suse.cz

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ