[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d73e8cc0259c140a0a49f670c6c165bb662281ed.camel@suse.com>
Date: Mon, 08 Sep 2025 16:27:17 -0300
From: Marcos Paulo de Souza <mpdesouza@...e.com>
To: Petr Mladek <pmladek@...e.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Steven Rostedt
<rostedt@...dmis.org>, John Ogness <john.ogness@...utronix.de>, Sergey
Senozhatsky <senozhatsky@...omium.org>, Jason Wessel
<jason.wessel@...driver.com>, Daniel Thompson <danielt@...nel.org>,
Douglas Anderson <dianders@...omium.org>, linux-kernel@...r.kernel.org,
kgdb-bugreport@...ts.sourceforge.net
Subject: Re: [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON
consoles
On Mon, 2025-09-08 at 17:51 +0200, Petr Mladek wrote:
> On Tue 2025-09-02 15:33:55, Marcos Paulo de Souza wrote:
> > Function kdb_msg_write was calling con->write for any found
> > console,
> > but it won't work on NBCON ones. In this case we should acquire the
> > ownership of the console using NBCON_PRIO_EMERGENCY, since printing
> > kdb messages should only be interrupted by a panic
>
> I would end the paragraph here.
>
> > in the case it was
> > triggered by sysrq debug option.
>
> This is not important. Also there are more ways how to trigger
> panic(). For example, it might happen by an error in the kdb code.
> Or I heard rumors that some HW even had a physical "trigger NMI"
> button.
>
> > This is done by the
> > nbcon_kdb_{acquire,release} functions.
>
> I think that this is more or less obvious.
I'll adjust the commit message per you suggestion.
>
> > At this point, the console is required to use the atomic callback.
> > The
> > console is skipped if the write_atomic callback is not set or if
> > the
> > context could not be acquired. The validation of NBCON is done by
> > the
> > console_is_usable helper. The context is released right after
> > write_atomic finishes.
> >
> > The oops_in_progress handling is only needed in the legacy
> > consoles,
> > so it was moved around the con->write callback.
>
> > --- a/kernel/debug/kdb/kdb_io.c
> > +++ b/kernel/debug/kdb/kdb_io.c
> > @@ -589,24 +589,40 @@ static void kdb_msg_write(const char *msg,
> > int msg_len)
> > */
> > cookie = console_srcu_read_lock();
> > for_each_console_srcu(c) {
> > - if (!(console_srcu_read_flags(c) & CON_ENABLED))
> > + struct nbcon_write_context wctxt = { };
> > + short flags = console_srcu_read_flags(c);
> > +
> > + if (!console_is_usable(c, flags, true))
> > continue;
> > if (c == dbg_io_ops->cons)
> > continue;
> > - if (!c->write)
> > - continue;
> > - /*
> > - * Set oops_in_progress to encourage the console
> > drivers to
> > - * disregard their internal spin locks: in the
> > current calling
> > - * context the risk of deadlock is a bigger
> > problem than risks
> > - * due to re-entering the console driver. We
> > operate directly on
> > - * oops_in_progress rather than using
> > bust_spinlocks() because
> > - * the calls bust_spinlocks() makes on exit are
> > not appropriate
> > - * for this calling context.
> > - */
> > - ++oops_in_progress;
> > - c->write(c, msg, msg_len);
> > - --oops_in_progress;
> > +
> > + if (flags & CON_NBCON) {
> > + /*
> > + * Do not continue if the console is NBCON
> > and the context
> > + * can't be acquired.
> > + */
> > + if (!nbcon_kdb_try_acquire(c, &wctxt))
> > + continue;
> > +
> > + wctxt.outbuf = (char *)msg;
> > + wctxt.len = msg_len;
>
> I double checked whether we initialized all members of the structure
> correctly. And I found that we didn't. We should call here:
>
> nbcon_write_context_set_buf(&wctxt,
> &pmsg.pbufs-
> >outbuf[0],
>
> pmsg.outbuf_len);
>
> Sigh, this is easy to miss. I remember that I was not super happy
> about this design. But the original code initialized the structure
> on a single place...
Ok, so I'll need to also export nbcon_write_context_set_buf, since it's
currently static inside kernel/printk/nbcon.c. I'll do it for the next
version.
>
> > + c->write_atomic(c, &wctxt);
> > + nbcon_kdb_release(&wctxt);
> > + } else {
> > + /*
> > + * Set oops_in_progress to encourage the
> > console drivers to
> > + * disregard their internal spin locks: in
> > the current calling
> > + * context the risk of deadlock is a
> > bigger problem than risks
> > + * due to re-entering the console driver.
> > We operate directly on
> > + * oops_in_progress rather than using
> > bust_spinlocks() because
> > + * the calls bust_spinlocks() makes on
> > exit are not appropriate
> > + * for this calling context.
> > + */
> > + ++oops_in_progress;
> > + c->write(c, msg, msg_len);
> > + --oops_in_progress;
> > + }
> > touch_nmi_watchdog();
> > }
> > console_srcu_read_unlock(cookie);
>
> Best Regards,
> Petr
Powered by blists - more mailing lists