lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <84h5xukti3.fsf@jogness.linutronix.de>
Date: Tue, 26 Aug 2025 17:17:32 +0206
From: John Ogness <john.ogness@...utronix.de>
To: Marcos Paulo de Souza <mpdesouza@...e.com>, Daniel Thompson
 <daniel@...cstar.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Petr Mladek
 <pmladek@...e.com>, Steven Rostedt <rostedt@...dmis.org>, Sergey
 Senozhatsky <senozhatsky@...omium.org>, Jason Wessel
 <jason.wessel@...driver.com>, Daniel Thompson <danielt@...nel.org>,
 Douglas Anderson <dianders@...omium.org>, linux-kernel@...r.kernel.org,
 kgdb-bugreport@...ts.sourceforge.net
Subject: Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON
 consoles

On 2025-08-11, Marcos Paulo de Souza <mpdesouza@...e.com> wrote:
> On Mon, 2025-08-11 at 15:38 +0100, Daniel Thompson wrote:
>> On Mon, Aug 11, 2025 at 10:32:47AM -0300, Marcos Paulo de Souza
>> wrote:
>> > Function kdb_msg_write was calling con->write for any found
>> > console, but it won't work on NBCON ones. In this case we should
>> > acquire the ownership of the console using NBCON_PRIO_EMERGENCY,
>> > since printing kdb messages should only be interrupted by a
>> > panic. This is done by the nbcon_kdb_{acquire,release} functions.
>> 
>> Just wanted to check what it means to be "interrupted by a panic"?
>> 
>> kdb is called from the panic handler but, assuming the serial port is
>> run syncrhonously when "bad things are happening", the serial port
>> should be quiet when we enter kdb meaning we can still acquire
>> ownership of the console?
>
> TBH I haven't thought about that. I talked with Petr Mladek about it,
> and we agreed to have something similar to nbcon_device_try_acquire,
> but with a higher priority, to make sure that we would get the context
> at this point. But, when thinking about it, since KDB runs on the panic
> handler, so we I'm not sure even if we need the _enter_unsafe() call at
> this point, since nobody is going to interrupt either way.

Well, kdb can also run when not in panic. In that case, if a panic
occurs while using kdb, those panic messages should be printed directly
on the consoles.

Also be aware that the kdb interface is using dbg_io_ops->write_char()
for printing and it will ignore a console that matches
dbg_io_ops->cons. So there is no concern about the kdb interface
conflicting with the nbcon console. This is just about the mirrored kdb
output.

> About the try_acquire part, I haven't checked about the state of the
> console devices when the panic happens, if they drop the ownership of
> the console on non-panic CPUs or not, so I'm not sure if this is needed
> or not. I'll wait for Petr and/or maybe John to add some info.

If any context owned the console and is in an unsafe section while being
interrupted by kdb (regardless if panic or not), then
nbcon_kdb_try_acquire() will fail and the mirrored kdb output will not
be visible on that console.

I am not sure how important it is that the output is mirrored in this
case. A hostile takeover is not acceptable. And implementing some sort
of delay so that the current owner has a chance to release the ownership
(similar to what was attempted here [0]) is not only complicated, but
has its own set of problems.

Currently there is no mirrored output for nbcon consoles. With this
series it is at least possible.

BTW, in order for CPU switching during panic to be able to mirror output
on nbcon consoles, an additional exception must be added to
nbcon_context_try_acquire_direct(). It would look like this:

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 79d8c74378061..2c168eaf378ed 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -10,6 +10,7 @@
 #include <linux/export.h>
 #include <linux/init.h>
 #include <linux/irqflags.h>
+#include <linux/kgdb.h>
 #include <linux/kthread.h>
 #include <linux/minmax.h>
 #include <linux/percpu.h>
@@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
 		 * Panic does not imply that the console is owned. However,
 		 * since all non-panic CPUs are stopped during panic(), it
 		 * is safer to have them avoid gaining console ownership.
+		 * The only exception is if kgdb is active, which may print
+		 * from multiple CPUs during a panic.
 		 *
 		 * If this acquire is a reacquire (and an unsafe takeover
 		 * has not previously occurred) then it is allowed to attempt
@@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
 		 * interrupted by the panic CPU while printing.
 		 */
 		if (other_cpu_in_panic() &&
+		    atomic_read(&kgdb_active) == -1 &&
 		    (!is_reacquire || cur->unsafe_takeover)) {
 			return -EPERM;
 		}

>> > @@ -605,7 +616,14 @@ static void kdb_msg_write(const char *msg, int msg_len)
>> > 		 * for this calling context.
>> > 		 */
>> > 		++oops_in_progress;
>> > -		c->write(c, msg, msg_len);
>> > +		if (flags & CON_NBCON) {
>> > +			wctxt.outbuf = (char *)msg;
>> > +			wctxt.len = msg_len;
>> > +			c->write_atomic(c, &wctxt);
>> > +			nbcon_kdb_release(&wctxt);
>> > +		} else {
>> > +			c->write(c, msg, msg_len);
>> > +		}
>> > 		--oops_in_progress;
>> > 		touch_nmi_watchdog();
>> > 	}
>> 
>> Dumb question, but is the oops_in_progress bump still useful with
>> atomic consoles? Will the console have any spinlocks to disregard or
>> will the console ownership code already handled any mutual exclusion
>> issues meaning there should be no spinlocks taking by the atomic
>> write handler?
>
> IIUC, since we can have multiple consoles, and some of them are NB and
> others aren't, I believe that this oops_in_progress is still useful.

Correct, but only for legacy consoles. Please move the @oops_in_progress
increment/decrement to only be around the c->write() call. This makes it
clear that the hack is only "useful" for the legacy consoles.

John

[0] https://lore.kernel.org/lkml/20210803131301.5588-4-john.ogness@linutronix.de

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ