lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20240803075127.217428-1-takakura@valinux.co.jp>
Date: Sat,  3 Aug 2024 16:51:27 +0900
From: takakura@...inux.co.jp
To: pmladek@...e.com
Cc: john.ogness@...utronix.de,
	rostedt@...dmis.org,
	senozhatsky@...omium.org,
	akpm@...ux-foundation.org,
	bhe@...hat.com,
	lukas@...ner.de,
	wangkefeng.wang@...wei.com,
	ubizjak@...il.com,
	feng.tang@...el.com,
	j.granados@...sung.com,
	stephen.s.brennan@...cle.com,
	linux-kernel@...r.kernel.org,
	nishimura@...inux.co.jp,
	taka@...inux.co.jp
Subject: Re: [PATCH] printk: CPU backtrace not printing on panic

Hi Petr,

On 2024-08-01, Petr Mladek <pmladek@...e.com> wrote:
>On Thu 2024-08-01 17:27:21, takakura@...inux.co.jp wrote:
>> Hi Petr and John,
>>
>> On 2024-07-30, Petr Mladek <pmladek@...e.com> wrote:
>> >On Fri 2024-07-26 16:02:45, John Ogness wrote:
>> >> On 2024-07-26, Petr Mladek <pmladek@...e.com> wrote:
>> >> > I would do it the other way and enable printing from other CPUs only
>> >> > when triggring the backtrace. We could do it because
>> >> > trigger_all_cpu_backtrace() waits until all backtraces are
>> >> > printed.
>> >> >
>> >> > Something like:
>> >> >
>> >> > diff --git a/include/linux/panic.h b/include/linux/panic.h
>> >> > index 3130e0b5116b..980bacbdfcfc 100644
>> >> > --- a/include/linux/panic.h
>> >> > +++ b/include/linux/panic.h
>> >> > @@ -16,6 +16,7 @@ extern void oops_enter(void);
>> >> >  extern void oops_exit(void);
>> >> >  extern bool oops_may_print(void);
>> >> >
>> >> > +extern int panic_triggering_all_cpu_backtrace;
>> >> >  extern int panic_timeout;
>> >> >  extern unsigned long panic_print;
>> >> >  extern int panic_on_oops;
>> >> > diff --git a/kernel/panic.c b/kernel/panic.c
>> >> > index f861bedc1925..7e9e97d59b1e 100644
>> >> > --- a/kernel/panic.c
>> >> > +++ b/kernel/panic.c
>> >> > @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
>> >> >  bool panic_on_taint_nousertaint = false;
>> >> >  static unsigned int warn_limit __read_mostly;
>> >> >
>> >> > +int panic_triggering_all_cpu_backtrace;
>> >> > +
>> >> >  int panic_timeout = CONFIG_PANIC_TIMEOUT;
>> >> >  EXPORT_SYMBOL_GPL(panic_timeout);
>> >> >
>> >> > @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
>> >> >   */
>> >> >  static void panic_other_cpus_shutdown(bool crash_kexec)
>> >> >  {
>> >> > -       if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> >> > +       if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>> >> > +               /* Temporary allow printing messages on non-panic CPUs. */
>> >> > +               panic_triggering_all_cpu_backtrace = true;
>> >> >                 trigger_all_cpu_backtrace();
>> >> > +               panic_triggering_all_cpu_backtrace = false;
>> >>
>> >> Note, here we should also add
>> >>
>> >>           nbcon_atomic_flush_pending();
>> >>
>> >> Your suggestion allows the other CPUs to dump their backtrace into the
>> >> ringbuffer, but they are still forbidden from acquiring the nbcon
>> >> console contexts for printing. That is a necessary requirement of
>> >> nbcon_waiter_matches().
>> >
>> >Great catch!
>> >
>> >I would prefer to solve this in a separate patch. This problem existed
>> >even before the commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
>> >to ringbuffer"). In fact, the problem existed very long time even for
>> >the legacy consoles.
>> >
>>
>> Good point! I guess the problem existed since the commit 51a1d258e50e
>> ("printk: Keep non-panic-CPUs out of console lock") as it forbade the
>> acquisition of console lock for non-panic cpus?
>
>It most likely existed since the commit 7acac3445acde1c94
>("printk: always use deferred printk when flush printk_safe lines")
>
>These were times when printk() serialized access to the log buffer
>using a spin lock. The backtraces from other CPUs were stored in
>temporary per-CPU buffers and later copied to the main log buffer.
>The above mentioned commit caused that printk_safe_flush_line()
>did not longer try to flush the messages to the console after
>copying the temporary stored messages.
>
>Well, the above commit was in Jan 2017. It was before panic
>allowed to show the backtraces.
>
>In practice, the problem with flushing bracktraces in panic has
>existed since the option to print the backtraces was added by
>the commit 60c958d8df9cfc40b ("panic: add sysctl to dump
>all CPUs backtraces on oops event") in Jun 2020.
>
>Best Regards,
>Petr

That is quite interesting how flushing of backtraces in panic 
has been affected differently over time until now, even going back 
to days before CPU backtrace was introduced in panic.

Thanks for sharing this!

Sincerely,
Ryo Takakura

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ