linux-kernel - Re: [PATCH v2 3/3] kdb: Adapt kdb_msg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aLWHmY9_I4rbV0wG@pathway.suse.cz>
Date: Mon, 1 Sep 2025 13:46:33 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Marcos Paulo de Souza <mpdesouza@...e.com>,
	Daniel Thompson <daniel@...cstar.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Sergey Senozhatsky <senozhatsky@...omium.org>,
	Jason Wessel <jason.wessel@...driver.com>,
	Daniel Thompson <danielt@...nel.org>,
	Douglas Anderson <dianders@...omium.org>,
	linux-kernel@...r.kernel.org, kgdb-bugreport@...ts.sourceforge.net
Subject: Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON
 consoles

On Fri 2025-08-29 16:18:28, John Ogness wrote:
> On 2025-08-29, Petr Mladek <pmladek@...e.com> wrote:
> >      c) kdb_msg_write() also writes the message on all other consoles
> > 	registered by printk. I guess that this is what John meant
> > 	by mirroring.
> 
> Yes.
> 
> >> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> >> index 79d8c74378061..2c168eaf378ed 100644
> >> --- a/kernel/printk/nbcon.c
> >> +++ b/kernel/printk/nbcon.c
> >> @@ -10,6 +10,7 @@
> >>  #include <linux/export.h>
> >>  #include <linux/init.h>
> >>  #include <linux/irqflags.h>
> >> +#include <linux/kgdb.h>
> >>  #include <linux/kthread.h>
> >>  #include <linux/minmax.h>
> >>  #include <linux/percpu.h>
> >> @@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> >>  		 * Panic does not imply that the console is owned. However,
> >>  		 * since all non-panic CPUs are stopped during panic(), it
> >>  		 * is safer to have them avoid gaining console ownership.
> >> +		 * The only exception is if kgdb is active, which may print
> >> +		 * from multiple CPUs during a panic.
> >>  		 *
> >>  		 * If this acquire is a reacquire (and an unsafe takeover
> >>  		 * has not previously occurred) then it is allowed to attempt
> >> @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> >>  		 * interrupted by the panic CPU while printing.
> >>  		 */
> >>  		if (other_cpu_in_panic() &&
> >> +		    atomic_read(&kgdb_active) == -1 &&
> >
> > This would likely work for most kgdb_printk() calls. But what about
> > the one called from kgdb_panic()?
> 
> Nice catch.
> 
> > Alternative solution would be to allow it only for the CPU locked
> > by kdb, something like:
> >
> > 		    READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() &&
> 
> Yes, I like this.
>
> > Note that I used READ_ONCE() to guarantee an atomic read. The
> > condition will fail only when we are inside a code locked by
> > the kdb_printf_cpu().
> 
> Neither the READ_ONCE() nor any memory barriers are needed because the
> only interesting case is when the CPU sees that it is the one stored in
> @kdb_printf_cpu. In which case it was the one that did the storing and
> the value is always correctly loaded.

Let me play the devil advocate for a bit.
What about the following race?

kdb_printf_cpu = -1  (0xffffffff)

CPU 0xff				CPU 0x1

					panic()

printk()
  nbcon_atomic_flush_pending()
     nbcon_context_try_acquire_direct()
	# load low byte of kdb_printf_cpu
	val = 0xff

					vkdb_printf()
					  cmpxchg(&kdb_printf_cpu, ...)
					  kdb_printf_cpu == 0x1

	# load higher byte of kdb_printf_cpu
	val = 0xff

Result: CPU 0xff would be allowed to acquire the nbcon context
	because it thinks that vkdb_printf() got locked on this CPU.

	It is not fully artificial, see
	https://lwn.net/Articles/793253/#Load%20Tearing

The above race is not critical. CPU 0x1 still could wait for CPU 0xff
and acquire the nbcon context later.

But it is something unexpected. I would feel more comfortable if
we used the READ_ONCE() and be on the safe side.

> >> [0] https://lore.kernel.org/lkml/20210803131301.5588-4-john.ogness@linutronix.de
> >
> > Sigh, I have already forgotten that we discussed this in the past.
> 
> After so many years, I do not think there is a printk scenario we have
> not discussed. ;-)

;-)

Best Regards,
Petr