lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=VHa1arysMgqZcGFGFi2N8i0BeKWD6BM8dSsg0Xq2LUFQ@mail.gmail.com>
Date:   Fri, 25 Aug 2023 07:18:44 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Daniel Thompson <daniel.thompson@...aro.org>
Cc:     Petr Mladek <pmladek@...e.com>,
        Jason Wessel <jason.wessel@...driver.com>,
        kgdb-bugreport@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] kgdb: Flush console before entering kgdb on panic

Hi,

On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
<daniel.thompson@...aro.org> wrote:
>
> On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > console isn't flushed before the `kdb` prompt came up. Specifically,
> > when using the buddy lockup detector on arm64 and running:
> >   echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> >
> > I could see:
> >   [   26.161099] lkdtm: Performing direct entry HARDLOCKUP
> >   [   32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> >   [   32.552865] Sending NMI from CPU 5 to CPUs 6:
> >   [   32.557359] NMI backtrace for cpu 6
> >   ... [backtrace for cpu 6] ...
> >   [   32.558353] NMI backtrace for cpu 5
> >   ... [backtrace for cpu 5] ...
> >   [   32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> >   [   32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> >
> >   Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> >   [5]kdb>
> >
> > As you can see, backtraces for the other CPUs start printing and get
> > interleaved with the kdb PANIC print.
> >
> > Let's replicate the commands to flush the console in the kdb panic
> > entry point to avoid this.
> >
> > Signed-off-by: Douglas Anderson <dianders@...omium.org>
> > ---
> >
> >  kernel/debug/debug_core.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > index d5e9ccde3ab8..3a904d8697c8 100644
> > --- a/kernel/debug/debug_core.c
> > +++ b/kernel/debug/debug_core.c
> > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> >       if (panic_timeout)
> >               return;
> >
> > +     debug_locks_off();
> > +     console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > +
> >       if (dbg_kdb_mode)
> >               kdb_printf("PANIC: %s\n", msg);
>
> I'm somewhat included to say *this* (calling kdb_printf() when not
> actually in the debugger) is the cause of the problem. kdb_printf()
> does some pretty horid things to the console and isn't intended to
> run while the system is active.
>
> I'd therefore be more tempted to defer the print to the b.p. trap
> handler itself and make this part of kgdb_panic() look more like:
>
>         kgdb_panic_msg = msg;
>         kgdb_breakpoint();
>         kgdb_panic_msg = NULL;

Unfortunately I think that only solves half the problem. As a quick
test, I tried simply commenting out the "kdb_printf" line in
kgdb_panic(). While that avoids the interleaved panic message and
backtrace, it does nothing to actually get the backtraces printed out
before you end up in kdb. As an example, this is what happened when I
used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
had the "kdb_printf" in kgdb_panic() commented out:

[   72.658424] lkdtm: Performing direct entry HARDLOCKUP
[   82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
...
[   82.234801] Sending NMI from CPU 5 to CPUs 6:
[   82.239296] NMI backtrace for cpu 6
... [ stack trace for CPU 6 ] ...
[   82.240294] NMI backtrace for cpu 5
... [ stack trace for CPU 5 ] ...
[   82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
[   82.581291] NMI backtrace
Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
to Keyboard Entry
[5]kdb>

As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
show up if I use the "dmesg" command but it's a bit of a hassle to run
"dmesg" to look for any extra debug messages every time I drop in kdb.

I guess perhaps that part isn't obvious from the commit message?
Should I send a new version with an updated commit message indicating
that it's not just the jumbled text that's a problem but also the lack
of stack traces?

Thanks!

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ