[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZnQobCf7kk5LZOHL@pathway.suse.cz>
Date: Thu, 20 Jun 2024 15:02:36 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH printk v2 11/18] printk: nbcon: Show replay message on
takeover
On Tue 2024-06-04 01:30:46, John Ogness wrote:
> An emergency or panic context can takeover console ownership
> while the current owner was printing a printk message. The
> atomic printer will re-print the message that the previous
> owner was printing. However, this can look confusing to the
> user and may even seem as though a message was lost.
>
> [3430014.1
> [3430014.181123] usb 1-2: Product: USB Audio
>
> Add a new field @nbcon_prev_seq to struct console to track
> the sequence number to print that was assigned to the previous
> console owner. If this matches the sequence number to print
> that the current owner is assigned, then a takeover must have
> occurred. In this case, print an additional message to inform
> the user that the previous message is being printed again.
>
> [3430014.1
> ** replaying previous printk message **
> [3430014.181123] usb 1-2: Product: USB Audio
>
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -891,6 +892,28 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
> if (dropped && !is_extended)
> console_prepend_dropped(&pmsg, dropped);
>
> + /*
> + * If the previous owner was assigned the same record, this context
> + * has taken over ownership and is replaying the record. Prepend a
> + * message to let the user know the record is replayed.
> + */
> + ulseq = atomic_long_read(&ACCESS_PRIVATE(con, nbcon_prev_seq));
> + if (__ulseq_to_u64seq(prb, ulseq) == pmsg.seq) {
> + console_prepend_replay(&pmsg);
> + } else {
> + /*
> + * Ensure this context is still the owner before trying to
> + * update @nbcon_prev_seq. Otherwise the value in @ulseq may
> + * not be from the previous owner.
> + */
> + nbcon_state_read(con, &cur);
> + if (!nbcon_context_can_proceed(ctxt, &cur))
> + return false;
I would like to better understand the motivation here.
I guess that this pattern might be repeated in the future.
For example, it might bring the question why we do not do
the same in nbcon_seq_try_update().
OK, this is called in context where the takeover is unsafe.
It could race only with nbcon_atomic_flush_unsafe()
which is called at the very end of panic().
So we have:
+ The race is a corner case.
+ It could happen only once beause non-panic() CPUs should
not be able to acquire the console context any more.
+ The check reduces the race window but it does not prevent
the race.
Do we care about these races?
BTW: In this particular case, the race looks innocent. It might
happen only once. And the cmpxchg would fail in this case.
I would replace the check with a comment:
/*
* Only nbcon_atomic_flush_unsafe() could take over this
* context. It would be the only context which could move
* forward from now on. The try_cmpxchg() would fail in
* this case. It should succeed and be safe in all other
* scenarios.
*/
> +
> + atomic_long_try_cmpxchg(&ACCESS_PRIVATE(con, nbcon_prev_seq), &ulseq,
> + __u64seq_to_ulseq(pmsg.seq));
> + }
> +
> if (!nbcon_context_exit_unsafe(ctxt))
> return false;
>
Othrewise, it looks good.
Best Regards,
Petr
Powered by blists - more mailing lists