linux-kernel - Re: [PATCH printk v4 15/27] printk: nbcon: Provide function to flush using write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zhaog95-FDGaAU36@localhost.localdomain>
Date: Wed, 10 Apr 2024 16:56:03 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v4 15/27] printk: nbcon: Provide function to flush
 using write_atomic()

On Wed 2024-04-03 00:17:17, John Ogness wrote:
> From: Thomas Gleixner <tglx@...utronix.de>
> 
> Provide nbcon_atomic_flush_pending() to perform flushing of all
> registered nbcon consoles using their write_atomic() callback.
> 
> Unlike console_flush_all(), nbcon_atomic_flush_pending() will
> only flush up through the newest record at the time of the
> call. This prevents a CPU from printing unbounded when other
> CPUs are adding records.
> 
> Also unlike console_flush_all(), nbcon_atomic_flush_pending()
> will fully flush one console before flushing the next. This
> helps to guarantee that a block of pending records (such as
> a stack trace in an emergency situation) can be printed
> atomically at once before releasing console ownership.
> 
> nbcon_atomic_flush_pending() is safe in any context because it
> uses write_atomic() and acquires with unsafe_takeover disabled.
> 
> Use it in console_flush_on_panic() before flushing legacy
> consoles. The legacy write() callbacks are not fully safe when
> oops_in_progress is set.
> 
> Co-developed-by: John Ogness <john.ogness@...utronix.de>
> Signed-off-by: John Ogness <john.ogness@...utronix.de>
> Signed-off-by: Thomas Gleixner (Intel) <tglx@...utronix.de>

Reviewed-by: Petr Mladek <pmladek@...e.com>

See few nits below.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -937,6 +935,108 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>  	return nbcon_context_exit_unsafe(ctxt);
>  }
>  
> +/**
> + * __nbcon_atomic_flush_pending_con - Flush specified nbcon console using its
> + *					write_atomic() callback
> + * @con:			The nbcon console to flush
> + * @stop_seq:			Flush up until this record
> + *
> + * Return:	True if taken over while printing. Otherwise false.
> + *
> + * If flushing up to @stop_seq was not successful, it only makes sense for the
> + * caller to try again when true was returned. When false is returned, either
> + * there are no more records available to read or this context is not allowed
> + * to acquire the console.
> + */
> +static bool __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
> +{
> +	struct nbcon_write_context wctxt = { };
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> +
> +	ctxt->console			= con;
> +	ctxt->spinwait_max_us		= 2000;
> +	ctxt->prio			= NBCON_PRIO_NORMAL;

Nit: It looks strange to harcode NBCON_PRIO_NORMAL and call it from
     console_flush_on_panic() in the same patch.

     I see. It will get replaced by nbcon_get_default_prio() later.
     I guess that it is just a relic from several reworks and
     shufling. I know that it is hard.

     It might have been better to either add the call in
     console_flush_in_panic() later. Or introduce nbcon_get_default_prio()
     earlier so that we could use it here.


> +
> +	if (!nbcon_context_try_acquire(ctxt))
> +		return false;
> +
> +	while (nbcon_seq_read(con) < stop_seq) {
> +		/*
> +		 * nbcon_emit_next_record() returns false when the console was
> +		 * handed over or taken over. In both cases the context is no
> +		 * longer valid.
> +		 */
> +		if (!nbcon_emit_next_record(&wctxt))
> +			return true;
> +
> +		if (!ctxt->backlog)
> +			break;
> +	}
> +
> +	nbcon_context_release(ctxt);
> +
> +	return false;
> +}
> +
> +/**
> + * __nbcon_atomic_flush_pending - Flush all nbcon consoles using their
> + *					write_atomic() callback
> + * @stop_seq:			Flush up until this record
> + */
> +static void __nbcon_atomic_flush_pending(u64 stop_seq)
> +{
> +	struct console *con;
> +	bool should_retry;
> +	int cookie;
> +
> +	do {
> +		should_retry = false;
> +
> +		cookie = console_srcu_read_lock();
> +		for_each_console_srcu(con) {
> +			short flags = console_srcu_read_flags(con);
> +			unsigned long irq_flags;
> +
> +			if (!(flags & CON_NBCON))
> +				continue;
> +
> +			if (!console_is_usable(con, flags))
> +				continue;
> +
> +			if (nbcon_seq_read(con) >= stop_seq)
> +				continue;
> +
> +			/*
> +			 * Atomic flushing does not use console driver
> +			 * synchronization (i.e. it does not hold the port
> +			 * lock for uart consoles). Therefore IRQs must be
> +			 * disabled to avoid being interrupted and then
> +			 * calling into a driver that will deadlock trying
> +			 * to acquire console ownership.
> +			 */
> +			local_irq_save(irq_flags);
> +
> +			should_retry |= __nbcon_atomic_flush_pending_con(con, stop_seq);

Nit: I have to say that this is quite cryptic. The "true" return value
     usually means success. But it sets "should_retry" here.

     It would mean sligtly more code but it would be much more clear
     when __nbcon_atomic_flush_pending_con() returns:

	+ 0 on success
	+ -EAGAIN when lost the owenership
	+ -EPERM when can't get the owner ship like nbcon_context_try_acquire_direct()

     and we make the decision here.

> +
> +			local_irq_restore(irq_flags);
> +		}
> +		console_srcu_read_unlock(cookie);
> +	} while (should_retry);
> +}

Neither of the nits is a blocker. They are basically just about potential
complications for the future code archaeologists.

Best Regards,
Petr