[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzarXlj1NyFGTC08@alley>
Date: Fri, 30 Sep 2022 10:39:58 +0200
From: Petr Mladek <pmladek@...e.com>
To: Doug Anderson <dianders@...omium.org>
Cc: John Ogness <john.ogness@...utronix.de>,
Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Jason Wessel <jason.wessel@...driver.com>,
Daniel Thompson <daniel.thompson@...aro.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jirislaby@...nel.org>,
Aaron Tomlin <atomlin@...hat.com>,
Luis Chamberlain <mcgrof@...nel.org>,
kgdb-bugreport@...ts.sourceforge.net, linux-serial@...r.kernel.org
Subject: Re: [PATCH printk 10/18] kgbd: Pretend that console list walk is safe
On Wed 2022-09-28 16:32:15, Doug Anderson wrote:
> Hi,
>
> On Fri, Sep 23, 2022 at 5:05 PM John Ogness <john.ogness@...utronix.de> wrote:
> >
> > From: Thomas Gleixner <tglx@...utronix.de>
> >
> > Provide a special list iterator macro for KGDB to allow unprotected list
> > walks and add a few comments to explain the hope based approach.
> >
> > Preperatory change for changing the console list to hlist and adding
>
> s/Preperatory/Preparatory
>
> > lockdep asserts to regular list walks.
> >
> > diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> > index af2aa76bae15..57a5fd27dffe 100644
> > --- a/drivers/tty/serial/kgdboc.c
> > +++ b/drivers/tty/serial/kgdboc.c
> > @@ -462,10 +462,13 @@ static void kgdboc_earlycon_pre_exp_handler(void)
> > * we have no other choice so we keep using it. Since not all
> > * serial drivers might be OK with this, print a warning once per
> > * boot if we detect this case.
> > + *
> > + * Pretend that walking the console list is safe...
>
> To be fair, this is not quite as unsafe as your comment makes it
> sound. kgdb is a "stop the world" debugger and when this function is
> executing then all of the other CPUs in the system should have been
> rounded up and idle (or, perhaps, busy looping). Essentially as long
> as console list manipulation is always made in a way that each
> instruction keeps the list in a reasonable state then what kgdb is
> doing is actually "safe". Said another way: we could drop into the
> debugger at any point when a task is manipulating the console list,
> but once we're in the debugger and are executing the "pre_exp_handler"
> then all the other CPUs have been frozen in time.
The code in register_console()/unregister_console() seems to
manipulate the list in the right order. But the correctness
is not guaranteed because there are neither compiler nor
memory barriers.
That said, later patches add for_each_console_srcu(). IMHO,
the SRCU walk should be safe here.
>
> > */
> > - for_each_console(con)
> > + for_each_console_kgdb(con) {
> > if (con == kgdboc_earlycon_io_ops.cons)
> > return;
> > + }
> >
> > already_warned = true;
> > pr_warn("kgdboc_earlycon is still using bootconsole\n");
> > --- a/kernel/debug/kdb/kdb_io.c
> > +++ b/kernel/debug/kdb/kdb_io.c
> > @@ -558,7 +558,12 @@ static void kdb_msg_write(const char *msg, int msg_len)
> > cp++;
> > }
> >
> > - for_each_console(c) {
> > + /*
> > + * This is a completely unprotected list walk designed by the
> > + * wishful thinking department. See the oops_in_progress comment
> > + * below - especially the encourage section...
>
> The reality is also a little less dire here than the comment suggests.
> IMO this is actually not the same as the "oops_in_progress" case that
> the comment refers to.
>
> Specifically, the "oops_in_progress" is referring to the fact that
> it's not uncommon to drop into the debugger when a serial driver (the
> same one you're using for kgdb) is holding its lock. Possibly it's
> printing something to the tty running on the UART dumping stuff out
> from the kernel's console. That's not great and I won't pretend that
> the kgdb design is amazing here, but...
>
> Just like above, I don't feel like iterating through the console list
> here without holding the lock is necessarily unsafe. Just like above,
> all the rest of the CPUs in the system are in a holding pattern and
> aren't actively executing any code. While we may have interrupted them
> at any given instruction, they won't execute any more instruction
> until we leave kgdb and resume running.
The atomic consoles might improve the situation. Well, the hand shake
will not really work because the current owner might be stopped.
But we will at least know that the port is not in a safe state.
Anyway, what about using the later added SRCU walk here?
After all, this is exactly what RCU is for, isn't it?
Best Regards,
Petr
Powered by blists - more mailing lists