[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLWHmY9_I4rbV0wG@pathway.suse.cz>
Date: Mon, 1 Sep 2025 13:46:33 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Marcos Paulo de Souza <mpdesouza@...e.com>,
Daniel Thompson <daniel@...cstar.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Steven Rostedt <rostedt@...dmis.org>,
Sergey Senozhatsky <senozhatsky@...omium.org>,
Jason Wessel <jason.wessel@...driver.com>,
Daniel Thompson <danielt@...nel.org>,
Douglas Anderson <dianders@...omium.org>,
linux-kernel@...r.kernel.org, kgdb-bugreport@...ts.sourceforge.net
Subject: Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON
consoles
On Fri 2025-08-29 16:18:28, John Ogness wrote:
> On 2025-08-29, Petr Mladek <pmladek@...e.com> wrote:
> > c) kdb_msg_write() also writes the message on all other consoles
> > registered by printk. I guess that this is what John meant
> > by mirroring.
>
> Yes.
>
> >> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> >> index 79d8c74378061..2c168eaf378ed 100644
> >> --- a/kernel/printk/nbcon.c
> >> +++ b/kernel/printk/nbcon.c
> >> @@ -10,6 +10,7 @@
> >> #include <linux/export.h>
> >> #include <linux/init.h>
> >> #include <linux/irqflags.h>
> >> +#include <linux/kgdb.h>
> >> #include <linux/kthread.h>
> >> #include <linux/minmax.h>
> >> #include <linux/percpu.h>
> >> @@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> >> * Panic does not imply that the console is owned. However,
> >> * since all non-panic CPUs are stopped during panic(), it
> >> * is safer to have them avoid gaining console ownership.
> >> + * The only exception is if kgdb is active, which may print
> >> + * from multiple CPUs during a panic.
> >> *
> >> * If this acquire is a reacquire (and an unsafe takeover
> >> * has not previously occurred) then it is allowed to attempt
> >> @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> >> * interrupted by the panic CPU while printing.
> >> */
> >> if (other_cpu_in_panic() &&
> >> + atomic_read(&kgdb_active) == -1 &&
> >
> > This would likely work for most kgdb_printk() calls. But what about
> > the one called from kgdb_panic()?
>
> Nice catch.
>
> > Alternative solution would be to allow it only for the CPU locked
> > by kdb, something like:
> >
> > READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() &&
>
> Yes, I like this.
>
> > Note that I used READ_ONCE() to guarantee an atomic read. The
> > condition will fail only when we are inside a code locked by
> > the kdb_printf_cpu().
>
> Neither the READ_ONCE() nor any memory barriers are needed because the
> only interesting case is when the CPU sees that it is the one stored in
> @kdb_printf_cpu. In which case it was the one that did the storing and
> the value is always correctly loaded.
Let me play the devil advocate for a bit.
What about the following race?
kdb_printf_cpu = -1 (0xffffffff)
CPU 0xff CPU 0x1
panic()
printk()
nbcon_atomic_flush_pending()
nbcon_context_try_acquire_direct()
# load low byte of kdb_printf_cpu
val = 0xff
vkdb_printf()
cmpxchg(&kdb_printf_cpu, ...)
kdb_printf_cpu == 0x1
# load higher byte of kdb_printf_cpu
val = 0xff
Result: CPU 0xff would be allowed to acquire the nbcon context
because it thinks that vkdb_printf() got locked on this CPU.
It is not fully artificial, see
https://lwn.net/Articles/793253/#Load%20Tearing
The above race is not critical. CPU 0x1 still could wait for CPU 0xff
and acquire the nbcon context later.
But it is something unexpected. I would feel more comfortable if
we used the READ_ONCE() and be on the safe side.
> >> [0] https://lore.kernel.org/lkml/20210803131301.5588-4-john.ogness@linutronix.de
> >
> > Sigh, I have already forgotten that we discussed this in the past.
>
> After so many years, I do not think there is a printk scenario we have
> not discussed. ;-)
;-)
Best Regards,
Petr
Powered by blists - more mailing lists