[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZQrfGho-M9ZHQe9F@alley>
Date: Wed, 20 Sep 2023 14:01:30 +0200
From: Petr Mladek <pmladek@...e.com>
To: Daniel Thompson <daniel.thompson@...aro.org>
Cc: Doug Anderson <dianders@...omium.org>,
Jason Wessel <jason.wessel@...driver.com>,
kgdb-bugreport@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] kgdb: Flush console before entering kgdb on panic
On Wed 2023-08-30 10:53:59, Daniel Thompson wrote:
> On Fri, Aug 25, 2023 at 07:18:44AM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
> > <daniel.thompson@...aro.org> wrote:
> > >
> > > On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > > > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > > > console isn't flushed before the `kdb` prompt came up. Specifically,
> > > > when using the buddy lockup detector on arm64 and running:
> > > > echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> > > >
> > > > I could see:
> > > > [ 26.161099] lkdtm: Performing direct entry HARDLOCKUP
> > > > [ 32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > > > [ 32.552865] Sending NMI from CPU 5 to CPUs 6:
> > > > [ 32.557359] NMI backtrace for cpu 6
> > > > ... [backtrace for cpu 6] ...
> > > > [ 32.558353] NMI backtrace for cpu 5
> > > > ... [backtrace for cpu 5] ...
> > > > [ 32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> > > > [ 32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> > > >
> > > > Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> > > > [5]kdb>
> > > >
> > > > As you can see, backtraces for the other CPUs start printing and get
> > > > interleaved with the kdb PANIC print.
> > > >
> > > > Let's replicate the commands to flush the console in the kdb panic
> > > > entry point to avoid this.
> > > >
> > > > Signed-off-by: Douglas Anderson <dianders@...omium.org>
> > > > ---
> > > >
> > > > kernel/debug/debug_core.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > > > index d5e9ccde3ab8..3a904d8697c8 100644
> > > > --- a/kernel/debug/debug_core.c
> > > > +++ b/kernel/debug/debug_core.c
> > > > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> > > > if (panic_timeout)
> > > > return;
> > > >
> > > > + debug_locks_off();
> > > > + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > > > +
> > > > if (dbg_kdb_mode)
> > > > kdb_printf("PANIC: %s\n", msg);
> > >
> > > I'm somewhat included to say *this* (calling kdb_printf() when not
> > > actually in the debugger) is the cause of the problem. kdb_printf()
> > > does some pretty horid things to the console and isn't intended to
> > > run while the system is active.
> > >
> > > I'd therefore be more tempted to defer the print to the b.p. trap
> > > handler itself and make this part of kgdb_panic() look more like:
> > >
> > > kgdb_panic_msg = msg;
> > > kgdb_breakpoint();
> > > kgdb_panic_msg = NULL;
> >
> > Unfortunately I think that only solves half the problem. As a quick
> > test, I tried simply commenting out the "kdb_printf" line in
> > kgdb_panic(). While that avoids the interleaved panic message and
> > backtrace, it does nothing to actually get the backtraces printed out
> > before you end up in kdb. As an example, this is what happened when I
> > used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
> > had the "kdb_printf" in kgdb_panic() commented out:
> >
> > [ 72.658424] lkdtm: Performing direct entry HARDLOCKUP
> > [ 82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > ...
> > [ 82.234801] Sending NMI from CPU 5 to CPUs 6:
> > [ 82.239296] NMI backtrace for cpu 6
> > ... [ stack trace for CPU 6 ] ...
> > [ 82.240294] NMI backtrace for cpu 5
> > ... [ stack trace for CPU 5 ] ...
> > [ 82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
> > [ 82.581291] NMI backtrace
> > Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
> > to Keyboard Entry
> > [5]kdb>
> >
> > As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
> > show up if I use the "dmesg" command but it's a bit of a hassle to run
> > "dmesg" to look for any extra debug messages every time I drop in kdb.
> >
> > I guess perhaps that part isn't obvious from the commit message?
>
> I figured it was a risk.
>
> In fact it's an area where my instinct to honour console messages and my
> instinct to get into the kernel as soon as possible after the decision
> to invoke it has been made come into conflict.
>
> In other words does it matter that the console buffers are not flushed
> before entering kgdb? However having thought about it for a little while
> (and knowing the console code tends to be written to be decently robust)
> I can come to the view the flushing is best.
Just for record. I wondered if there was bug in printk() that the
backtraces were not flushed. And I think that it is because they
are printed in NMI context where the conosle handling is deferred.
>From this POV, the explicit flush makes perfect sense.
Best Regards,
Petr
Powered by blists - more mailing lists