linux-kernel - Re: [PATCH printk v2 11/18] printk: nbcon: Show replay message on takeover

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZnQobCf7kk5LZOHL@pathway.suse.cz>
Date: Thu, 20 Jun 2024 15:02:36 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH printk v2 11/18] printk: nbcon: Show replay message on
 takeover

On Tue 2024-06-04 01:30:46, John Ogness wrote:
> An emergency or panic context can takeover console ownership
> while the current owner was printing a printk message. The
> atomic printer will re-print the message that the previous
> owner was printing. However, this can look confusing to the
> user and may even seem as though a message was lost.
> 
>   [3430014.1
>   [3430014.181123] usb 1-2: Product: USB Audio
> 
> Add a new field @nbcon_prev_seq to struct console to track
> the sequence number to print that was assigned to the previous
> console owner. If this matches the sequence number to print
> that the current owner is assigned, then a takeover must have
> occurred. In this case, print an additional message to inform
> the user that the previous message is being printed again.
> 
>   [3430014.1
>   ** replaying previous printk message **
>   [3430014.181123] usb 1-2: Product: USB Audio
> 
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -891,6 +892,28 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
>  	if (dropped && !is_extended)
>  		console_prepend_dropped(&pmsg, dropped);
>  
> +	/*
> +	 * If the previous owner was assigned the same record, this context
> +	 * has taken over ownership and is replaying the record. Prepend a
> +	 * message to let the user know the record is replayed.
> +	 */
> +	ulseq = atomic_long_read(&ACCESS_PRIVATE(con, nbcon_prev_seq));
> +	if (__ulseq_to_u64seq(prb, ulseq) == pmsg.seq) {
> +		console_prepend_replay(&pmsg);
> +	} else {
> +		/*
> +		 * Ensure this context is still the owner before trying to
> +		 * update @nbcon_prev_seq. Otherwise the value in @ulseq may
> +		 * not be from the previous owner.
> +		 */
> +		nbcon_state_read(con, &cur);
> +		if (!nbcon_context_can_proceed(ctxt, &cur))
> +			return false;

I would like to better understand the motivation here.
I guess that this pattern might be repeated in the future.

For example, it might bring the question why we do not do
the same in nbcon_seq_try_update().

OK, this is called in context where the takeover is unsafe.
It could race only with nbcon_atomic_flush_unsafe()
which is called at the very end of panic().

So we have:

   + The race is a corner case.

   + It could happen only once beause non-panic() CPUs should
     not be able to acquire the console context any more.

   + The check reduces the race window but it does not prevent
     the race.

Do we care about these races?


BTW: In this particular case, the race looks innocent. It might
     happen only once. And the cmpxchg would fail in this case.


I would replace the check with a comment:

		/*
		 * Only nbcon_atomic_flush_unsafe() could take over this
		 * context. It would be the only context which could move
		 * forward from now on. The try_cmpxchg() would fail in
		 * this case. It should succeed and be safe in all other
		 * scenarios.
		 */
> +
> +		atomic_long_try_cmpxchg(&ACCESS_PRIVATE(con, nbcon_prev_seq), &ulseq,
> +					__u64seq_to_ulseq(pmsg.seq));
> +	}
> +
>  	if (!nbcon_context_exit_unsafe(ctxt))
>  		return false;
>  

Othrewise, it looks good.

Best Regards,
Petr