linux-kernel - Re: [PATCH v3 4/4] kdb: Adapt kdb_msg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d73e8cc0259c140a0a49f670c6c165bb662281ed.camel@suse.com>
Date: Mon, 08 Sep 2025 16:27:17 -0300
From: Marcos Paulo de Souza <mpdesouza@...e.com>
To: Petr Mladek <pmladek@...e.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Steven Rostedt	
 <rostedt@...dmis.org>, John Ogness <john.ogness@...utronix.de>, Sergey
 Senozhatsky <senozhatsky@...omium.org>, Jason Wessel
 <jason.wessel@...driver.com>, Daniel Thompson	 <danielt@...nel.org>,
 Douglas Anderson <dianders@...omium.org>, 	linux-kernel@...r.kernel.org,
 kgdb-bugreport@...ts.sourceforge.net
Subject: Re: [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON
 consoles

On Mon, 2025-09-08 at 17:51 +0200, Petr Mladek wrote:
> On Tue 2025-09-02 15:33:55, Marcos Paulo de Souza wrote:
> > Function kdb_msg_write was calling con->write for any found
> > console,
> > but it won't work on NBCON ones. In this case we should acquire the
> > ownership of the console using NBCON_PRIO_EMERGENCY, since printing
> > kdb messages should only be interrupted by a panic
> 
> I would end the paragraph here.
> 
> > in the case it was
> > triggered by sysrq debug option.
> 
> This is not important. Also there are more ways how to trigger
> panic(). For example, it might happen by an error in the kdb code.
> Or I heard rumors that some HW even had a physical "trigger NMI"
> button.
> 
> > This is done by the
> > nbcon_kdb_{acquire,release} functions.
> 
> I think that this is more or less obvious.

I'll adjust the commit message per you suggestion.

> 
> > At this point, the console is required to use the atomic callback.
> > The
> > console is skipped if the write_atomic callback is not set or if
> > the
> > context could not be acquired. The validation of NBCON is done by
> > the
> > console_is_usable helper. The context is released right after
> > write_atomic finishes.
> > 
> > The oops_in_progress handling is only needed in the legacy
> > consoles,
> > so it was moved around the con->write callback.
> 
> > --- a/kernel/debug/kdb/kdb_io.c
> > +++ b/kernel/debug/kdb/kdb_io.c
> > @@ -589,24 +589,40 @@ static void kdb_msg_write(const char *msg,
> > int msg_len)
> >  	 */
> >  	cookie = console_srcu_read_lock();
> >  	for_each_console_srcu(c) {
> > -		if (!(console_srcu_read_flags(c) & CON_ENABLED))
> > +		struct nbcon_write_context wctxt = { };
> > +		short flags = console_srcu_read_flags(c);
> > +
> > +		if (!console_is_usable(c, flags, true))
> >  			continue;
> >  		if (c == dbg_io_ops->cons)
> >  			continue;
> > -		if (!c->write)
> > -			continue;
> > -		/*
> > -		 * Set oops_in_progress to encourage the console
> > drivers to
> > -		 * disregard their internal spin locks: in the
> > current calling
> > -		 * context the risk of deadlock is a bigger
> > problem than risks
> > -		 * due to re-entering the console driver. We
> > operate directly on
> > -		 * oops_in_progress rather than using
> > bust_spinlocks() because
> > -		 * the calls bust_spinlocks() makes on exit are
> > not appropriate
> > -		 * for this calling context.
> > -		 */
> > -		++oops_in_progress;
> > -		c->write(c, msg, msg_len);
> > -		--oops_in_progress;
> > +
> > +		if (flags & CON_NBCON) {
> > +			/*
> > +			 * Do not continue if the console is NBCON
> > and the context
> > +			 * can't be acquired.
> > +			 */
> > +			if (!nbcon_kdb_try_acquire(c, &wctxt))
> > +				continue;
> > +
> > +			wctxt.outbuf = (char *)msg;
> > +			wctxt.len = msg_len;
> 
> I double checked whether we initialized all members of the structure
> correctly. And I found that we didn't. We should call here:
> 
> 			nbcon_write_context_set_buf(&wctxt,
> 						    &pmsg.pbufs-
> >outbuf[0],
> 						   
> pmsg.outbuf_len);
> 
> Sigh, this is easy to miss. I remember that I was not super happy
> about this design. But the original code initialized the structure
> on a single place...

Ok, so I'll need to also export nbcon_write_context_set_buf, since it's
currently static inside kernel/printk/nbcon.c. I'll do it for the next
version.

> 
> > +			c->write_atomic(c, &wctxt);
> > +			nbcon_kdb_release(&wctxt);
> > +		} else {
> > +			/*
> > +			 * Set oops_in_progress to encourage the
> > console drivers to
> > +			 * disregard their internal spin locks: in
> > the current calling
> > +			 * context the risk of deadlock is a
> > bigger problem than risks
> > +			 * due to re-entering the console driver.
> > We operate directly on
> > +			 * oops_in_progress rather than using
> > bust_spinlocks() because
> > +			 * the calls bust_spinlocks() makes on
> > exit are not appropriate
> > +			 * for this calling context.
> > +			 */
> > +			++oops_in_progress;
> > +			c->write(c, msg, msg_len);
> > +			--oops_in_progress;
> > +		}
> >  		touch_nmi_watchdog();
> >  	}
> >  	console_srcu_read_unlock(cookie);
> 
> Best Regards,
> Petr