linux-kernel - Re: [PATCH printk v7 30/35] printk: Add helper for flush type logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrNcr5-uZoQnSHii@pathway.suse.cz>
Date: Wed, 7 Aug 2024 13:39:06 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v7 30/35] printk: Add helper for flush type logic

Hi,

I have spent a lot of time on this patch. I have got lost and rewrote
the comments several times. I hope that they make sense after all.

On Sun 2024-08-04 02:57:33, John Ogness wrote:
> There are many call sites where console flushing occur.
> Depending on the system state and types of consoles, the flush
> methods to use are different. A flush call site generally must
> consider:
> 
> Introduce a new internal struct console_flush_type that
> specifies the flush method(s) that are available for a
> particular call site to use.
> 
> Introduce a helper function to fill out console_flush_type to
> be used for flushing call sites.
> 
> In many system states it is acceptable to flush legacy directly
> via console_unlock or via offload to irq_work. The caller must
> decide which of these methods it prefers.
> 
> Replace the logic of all flushing call sites to use the new
> helper.
> 
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -156,6 +157,70 @@ static inline bool console_is_usable(struct console *con, short flags) { return
>  #endif /* CONFIG_PRINTK */
>  
>  extern bool have_boot_console;
> +extern bool have_nbcon_console;
> +extern bool have_legacy_console;
> +
> +/**
> + * struct console_flush_type - Define available console flush methods
> + * @nbcon_atomic:	Flush directly using nbcon_atomic() callback
> + * @legacy_direct:	Call the legacy loop in this context
> + * @legacy_offload:	Offload the legacy loop into IRQ
> + *
> + * Note that the legacy loop also flushes the nbcon consoles.
> + */
> +struct console_flush_type {
> +	bool	nbcon_atomic;
> +	bool	legacy_direct;
> +	bool	legacy_offload;
> +};
> +
> +/*
> + * Identify which console flushing methods are available to the context of
> + * the caller. The caller can then decide which of the available flushing
> + * methods it will use.
> + */
> +static inline void printk_get_console_flush_type(struct console_flush_type *ft)
> +{
> +	memset(ft, 0, sizeof(*ft));
> +
> +	switch (nbcon_get_default_prio()) {
> +	case NBCON_PRIO_NORMAL:
> +		if (have_nbcon_console && !have_boot_console)
> +			ft->nbcon_atomic = true;
> +
> +		if (have_legacy_console || have_boot_console) {
> +			ft->legacy_offload = true;
> +			if (!is_printk_legacy_deferred())
> +				ft->legacy_direct = true;
> +		}

I would suggest to change the semantic and set the _preferred_
flush method instead of an _available_ one. Something like:

		/* Legacy consoles are flushed directly when possible. */
		if (have_legacy_console || have_boot_console) {
			if (!is_printk_legacy_deferred()) {
				ft->legacy_direct = true;
			} else {
				ft->legacy_offload = true;
			}
		}


It would simplify vprintk_emit() in this patch. Also it might keep
nbcon_atomic_flush_pending_con() simple after adding the printk
thread. See below.

IMHO, it will not make any difference in other situations.

I personally prefer the "set preferred flush method" semantic. It
helps to make sure that all callers behave consistently. I mean that
this function would define the flush rules in "all" situations.

That said, I do not resist on it. The code looks reasonable also
with the current semantic. And it is rather easy to review
all callers. Also I might miss something. I still do not see the full
picture (patches adding the kthread).

> +		break;
> +
> +	case NBCON_PRIO_PANIC:
> +		/*
> +		 * In panic, the nbcon consoles will directly print. But
> +		 * only allowed if there are no boot consoles.
> +		 */
> +		if (have_nbcon_console && !have_boot_console)
> +			ft->nbcon_atomic = true;
> +
> +		/*
> +		 * In panic, if nbcon atomic printing occurs, the legacy
> +		 * consoles must remain silent until explicitly allowed.
> +		 * Also, legacy consoles cannot print when deferred. However,
> +		 * console_flush_on_panic() will flush them anyway, even if
> +		 * unsafe.
> +		 */
> +		if ((legacy_allow_panic_sync || !ft->nbcon_atomic) &&
> +		    !is_printk_legacy_deferred()) {
> +			ft->legacy_direct = true;

We should check here whether the legacy loop is needed at all.
I would write something like:

		if (have_legacy_console || have_boot_console) {
			/*
			 * Same decision as in PRIO_NORMAL. But we are interested
			 * only in wheter we could flush directly.
			 */
			if (!is_printk_legacy_deferred()) {
				ft->legacy_direct = true;
			}
			/* PRIO_PANIC specific part: Legacy consoles remain silent... */
			if (ft->nbcon_atomic && !legacy_allow_panic_sync)
				ft->legacy_direct = false;

> +		}
> +		break;
> +
> +	default:
> +		WARN_ON_ONCE(1);
> +		break;
> +	}
> +}
>  
>  extern struct printk_buffers printk_shared_pbufs;
>  
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1186,8 +1187,16 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
>  	 * other context that will do it.
>  	 */
>  	if (prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
> -		stop_seq = prb_next_reserve_seq(prb);
> -		goto again;
> +		printk_get_console_flush_type(&ft);
> +		/*
> +		 * If nbcon_atomic flushing is not available, this printing
> +		 * must be occurring via the legacy loop. Let that loop be
> +		 * responsible for flushing the new records.
> +		 */

The comment is a bit confusing. It is not clear who is responsible for
calling the legacy loop.

My understanding is that the caller of this function is responsible
for calling the legacy loop. At least this is the case now. The
function is called only from vprintk_emit() and
console_flush_on_panic() and they both call the legacy loop when
needed.

I would rather remove this comment. Instead, I would make it clear
in the function description that: "The caller is responsible for
flushing the console via the legacy loop when needed."

> +		if (ft.nbcon_atomic) {
> +			stop_seq = prb_next_reserve_seq(prb);
> +			goto again;
> +		}

BTW: I wonder how this code would look like after adding the printk
     threads. We should do "goto again" only when ft.nbcon_atomic
     is the preferred (only available) flush type for nbcon consoles.

     IMHO, it is another reason to change the semantic.

>  	}
>  }
>  
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2369,34 +2364,23 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	if (other_cpu_in_panic())
>  		return 0;
>  
> +	printk_get_console_flush_type(&ft);

It is a nice trick to call printk_get_console_flush_type() this early.
I allows to hack the result when processing the hacky LOGLEVEL_SCHED ;-)

> +
>  	/* If called from the scheduler, we can not call up(). */
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
>  		do_trylock_unlock = false;
> -		defer_legacy = true;
> +	} else {
> +		do_trylock_unlock = ft.legacy_direct;
>  	}

We could hack the @ft structure directly here:

	if (level == LOGLEVEL_SCHED) {
		level = LOGLEVEL_DEFAULT;
		ft.legacy_offload |= ft.legacy_direct;
		ft.legacy_direct = false;
	}
>  
>  	printk_delay(level);
>  
>  	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
>  
> -	if (have_nbcon_console && !have_boot_console) {
> +	if (ft.nbcon_atomic)
>  		nbcon_atomic_flush_pending();
>  
> -		/*
> -		 * In panic, the legacy consoles are not allowed to print from
> -		 * the printk calling context unless explicitly allowed. This
> -		 * gives the safe nbcon consoles a chance to print out all the
> -		 * panic messages first. This restriction only applies if
> -		 * there are nbcon consoles registered and they are allowed to
> -		 * flush.
> -		 */
> -		if (this_cpu_in_panic() && !legacy_allow_panic_sync) {
> -			do_trylock_unlock = false;
> -			defer_legacy = false;
> -		}
> -	}
> -
>  	if (do_trylock_unlock) {

Then, with the "preferred flush method" semantic, we could use here:

	if (ft.legacy_direct) {

>  		/*
>  		 * The caller may be holding system-critical or
> @@ -2417,7 +2401,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		preempt_enable();
>  	}
>  
> -	if (defer_legacy)
> +	if (!do_trylock_unlock && ft.legacy_offload)

and here:

	if (ft.legacy_offload)

>  		defer_console_output();
>  	else
>  		wake_up_klogd();
> @@ -2776,10 +2760,14 @@ void resume_console(void)
>   */
>  static int console_cpu_notify(unsigned int cpu)
>  {
> -	if (!cpuhp_tasks_frozen && printing_via_unlock) {
> -		/* If trylock fails, someone else is doing the printing */
> -		if (console_trylock())
> -			console_unlock();
> +	struct console_flush_type ft;
> +
> +	if (!cpuhp_tasks_frozen) {
> +		printk_get_console_flush_type(&ft);
> +		if (ft.legacy_direct) {
> +			if (console_trylock())
> +				console_unlock();

Why do we actually call only the legacy loop here?
IMHO, we should also do

	if (ft.nbcon_atomic)
 		nbcon_atomic_flush_pending();


> +		}
>  	}
>  	return 0;
>  }
> @@ -3327,7 +3316,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
>  	if (mode == CONSOLE_REPLAY_ALL)
>  		__console_rewind_all();
>  
> -	if (!have_boot_console)
> +	printk_get_console_flush_type(&ft);
> +	if (ft.nbcon_atomic)
>  		nbcon_atomic_flush_pending();

I would use "ft.legacy_direct" also below for the decision about
the legacy loop:

-	if (legacy_allow_panic_sync)
+	if (ft.legacy_direct)
		console_flush_all(false, &next_seq, &handover);

>  
>  	/* Flush legacy consoles once allowed, even when dangerous. */

Best Regards,
Petr