[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20240803075127.217428-1-takakura@valinux.co.jp>
Date: Sat, 3 Aug 2024 16:51:27 +0900
From: takakura@...inux.co.jp
To: pmladek@...e.com
Cc: john.ogness@...utronix.de,
rostedt@...dmis.org,
senozhatsky@...omium.org,
akpm@...ux-foundation.org,
bhe@...hat.com,
lukas@...ner.de,
wangkefeng.wang@...wei.com,
ubizjak@...il.com,
feng.tang@...el.com,
j.granados@...sung.com,
stephen.s.brennan@...cle.com,
linux-kernel@...r.kernel.org,
nishimura@...inux.co.jp,
taka@...inux.co.jp
Subject: Re: [PATCH] printk: CPU backtrace not printing on panic
Hi Petr,
On 2024-08-01, Petr Mladek <pmladek@...e.com> wrote:
>On Thu 2024-08-01 17:27:21, takakura@...inux.co.jp wrote:
>> Hi Petr and John,
>>
>> On 2024-07-30, Petr Mladek <pmladek@...e.com> wrote:
>> >On Fri 2024-07-26 16:02:45, John Ogness wrote:
>> >> On 2024-07-26, Petr Mladek <pmladek@...e.com> wrote:
>> >> > I would do it the other way and enable printing from other CPUs only
>> >> > when triggring the backtrace. We could do it because
>> >> > trigger_all_cpu_backtrace() waits until all backtraces are
>> >> > printed.
>> >> >
>> >> > Something like:
>> >> >
>> >> > diff --git a/include/linux/panic.h b/include/linux/panic.h
>> >> > index 3130e0b5116b..980bacbdfcfc 100644
>> >> > --- a/include/linux/panic.h
>> >> > +++ b/include/linux/panic.h
>> >> > @@ -16,6 +16,7 @@ extern void oops_enter(void);
>> >> > extern void oops_exit(void);
>> >> > extern bool oops_may_print(void);
>> >> >
>> >> > +extern int panic_triggering_all_cpu_backtrace;
>> >> > extern int panic_timeout;
>> >> > extern unsigned long panic_print;
>> >> > extern int panic_on_oops;
>> >> > diff --git a/kernel/panic.c b/kernel/panic.c
>> >> > index f861bedc1925..7e9e97d59b1e 100644
>> >> > --- a/kernel/panic.c
>> >> > +++ b/kernel/panic.c
>> >> > @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
>> >> > bool panic_on_taint_nousertaint = false;
>> >> > static unsigned int warn_limit __read_mostly;
>> >> >
>> >> > +int panic_triggering_all_cpu_backtrace;
>> >> > +
>> >> > int panic_timeout = CONFIG_PANIC_TIMEOUT;
>> >> > EXPORT_SYMBOL_GPL(panic_timeout);
>> >> >
>> >> > @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
>> >> > */
>> >> > static void panic_other_cpus_shutdown(bool crash_kexec)
>> >> > {
>> >> > - if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> >> > + if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>> >> > + /* Temporary allow printing messages on non-panic CPUs. */
>> >> > + panic_triggering_all_cpu_backtrace = true;
>> >> > trigger_all_cpu_backtrace();
>> >> > + panic_triggering_all_cpu_backtrace = false;
>> >>
>> >> Note, here we should also add
>> >>
>> >> nbcon_atomic_flush_pending();
>> >>
>> >> Your suggestion allows the other CPUs to dump their backtrace into the
>> >> ringbuffer, but they are still forbidden from acquiring the nbcon
>> >> console contexts for printing. That is a necessary requirement of
>> >> nbcon_waiter_matches().
>> >
>> >Great catch!
>> >
>> >I would prefer to solve this in a separate patch. This problem existed
>> >even before the commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
>> >to ringbuffer"). In fact, the problem existed very long time even for
>> >the legacy consoles.
>> >
>>
>> Good point! I guess the problem existed since the commit 51a1d258e50e
>> ("printk: Keep non-panic-CPUs out of console lock") as it forbade the
>> acquisition of console lock for non-panic cpus?
>
>It most likely existed since the commit 7acac3445acde1c94
>("printk: always use deferred printk when flush printk_safe lines")
>
>These were times when printk() serialized access to the log buffer
>using a spin lock. The backtraces from other CPUs were stored in
>temporary per-CPU buffers and later copied to the main log buffer.
>The above mentioned commit caused that printk_safe_flush_line()
>did not longer try to flush the messages to the console after
>copying the temporary stored messages.
>
>Well, the above commit was in Jan 2017. It was before panic
>allowed to show the backtraces.
>
>In practice, the problem with flushing bracktraces in panic has
>existed since the option to print the backtraces was added by
>the commit 60c958d8df9cfc40b ("panic: add sysctl to dump
>all CPUs backtraces on oops event") in Jun 2020.
>
>Best Regards,
>Petr
That is quite interesting how flushing of backtraces in panic
has been affected differently over time until now, even going back
to days before CPU backtrace was introduced in panic.
Thanks for sharing this!
Sincerely,
Ryo Takakura
Powered by blists - more mailing lists