lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af84eb57-2da2-461c-82bb-36312bbb49b0@paulmck-laptop>
Date: Wed, 5 Feb 2025 12:31:15 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, rcu@...r.kernel.org,
	linux-kernel@...r.kernel.org, kernel-team@...a.com,
	rostedt@...dmis.org, Frederic Weisbecker <frederic@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Alexei Starovoitov <ast@...nel.org>,
	Andrii Nakryiko <andrii@...nel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	linux-trace-kernel@...r.kernel.org, Petr Mladek <pmladek@...e.com>
Subject: Re: [PATCH rcu v2] 4/5] rcu-tasks: Move RCU Tasks self-tests to
 core_initcall()

On Wed, Feb 05, 2025 at 12:10:53PM -0800, Paul E. McKenney wrote:
> On Wed, Feb 05, 2025 at 09:00:07PM +0106, John Ogness wrote:
> > On 2025-02-05, "Paul E. McKenney" <paulmck@...nel.org> wrote:
> > >> This is caused by RCU falling behind a callback-flooding kthread that
> > >> invokes call_rcu() in a semi-tight loop.  Setting rcutree.kthread_prio=40
> > >> avoids the splat, but still gets the shutdown-time hang.  Retrying with
> > >> the default rcutree.kthread_prio=2 failed to reproduce the splat, but
> > >> it did reproduce the shutdown-time hang.
> > >> 
> > >> OK, maybe printk buffers are not being flushed?  A 100-millisecond sleep
> > >> at the end of of rcu_torture_cleanup() got all of rcutorture's output
> > >> flushed, but lost the subsequent shutdown-time console traffic.  The
> > >> pr_flush(HZ/10,1) seems more sensible, but this is private to printk().
> > >>
> > >> I would like to log the shutdown-time console traffic because RCU can
> > >> sometimes break things on that path.
> > 
> > pr_flush() was changed to private because there were no users. It would
> > not be a problem to make it available. Adding a pr_flush() to
> > rcu_torture_cleanup() would be an appropriate workaround for now (more
> > on this at the end).
> 
> Ah, got it.
> 
> > > There is a call to kmsg_dump(KMSG_DUMP_SHUTDOWN) in kernel_power_off()
> > > that appears to be intended to dump out the printk() buffers,
> > 
> > It only dumps the buffers to the registered kmsg_dumpers. It is not
> > responsible for flushing console backlogs.
> 
> That would explain its not doing much for me.  ;-)
> 
> > > but it
> > > does not seem to do so in kernels built with CONFIG_PREEMPT_RT=y.
> > > Does there need to be a pr_flush() call prior to the call to
> > > migrate_to_reboot_cpu()?  Or maybe even to do_kernel_power_off_prepare()
> > > or kernel_shutdown_prepare()?
> > 
> > With CONFIG_PREEMPT_RT=y, legacy consoles only print via a dedicated
> > kthread. Without a pr_flush() somewhere, there is basically no chance
> > that they will get backlogs flushed because noone is waitig for them.
> > 
> > The new console API (NBCON) provides support for "atomic consoles",
> > which _do_ flush by transitioning to synchronous printing during
> > shutdown/reboot. Unfortunately we still don't have any NBCON atomic
> > console implemented in the kernel. The 8250 UART will be our first
> > driver, most likely available in 6.15. (With the current PREEMPT_RT
> > patch applied, the 8250 NBCON atomic driver is used.)
> > 
> > Since only CONFIG_PREEMPT_RT=y has this issue, I am not sure if we want
> > to sprinkle pr_flush() calls on all sleepable shutdown/reboot paths,
> > although that is certainly one way to handle it. For your case, adding a
> > pr_flush() to rcu_torture_cleanup() and making pr_flush() non-private
> > would be an easy solution to avoid your problem.
> 
> Would it make sense to put a call to pr_flush() at the beginning of the
> kernel_power_off() function?  I suspect that would take care of much
> more than just rcutorture.
> 
> Or maybe after the pr_emerg("Power down\n"), since that is normally
> the last thing that shows up on rcutorture's console logs.
> 
> I will give that a shot and see what happens.  ;-)

And that does the trick!  I specified 500ms, but maybe -1 would be
better?  You tell me!  ;-)

							Thanx, Paul

------------------------------------------------------------------------

commit 172b692a79ec5801eab57cb55d2f9c398f244c62
Author: Paul E. McKenney <paulmck@...nel.org>
Date:   Wed Feb 5 12:27:23 2025 -0800

    printk: Flush console log from kernel_power_off()
    
    Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output
    and shutdown time, which hides shutdown-time RCU issues from rcutorture.
    Therefore, make pr_flush() public and invoke it after then last print
    in kernel_power_off().
    
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
    Cc: Petr Mladek <pmladek@...e.com>
    Cc: Steven Rostedt <rostedt@...dmis.org>
    Cc: John Ogness <john.ogness@...utronix.de>
    Cc: Sergey Senozhatsky <senozhatsky@...omium.org>

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 4217a9f412b2..b3e88ff1ecc3 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -806,4 +806,6 @@ static inline void print_hex_dump_debug(const char *prefix_str, int prefix_type,
 #define print_hex_dump_bytes(prefix_str, prefix_type, buf, len)	\
 	print_hex_dump_debug(prefix_str, prefix_type, 16, 1, buf, len, true)
 
+
+bool pr_flush(int timeout_ms, bool reset_on_progress);
 #endif
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 80910bc3470c..e0c6e5a99897 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2461,7 +2461,7 @@ asmlinkage __visible int _printk(const char *fmt, ...)
 }
 EXPORT_SYMBOL(_printk);
 
-static bool pr_flush(int timeout_ms, bool reset_on_progress);
+bool pr_flush(int timeout_ms, bool reset_on_progress);
 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress);
 
 #else /* CONFIG_PRINTK */
@@ -4466,7 +4466,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
  * Context: Process context. May sleep while acquiring console lock.
  * Return: true if all usable printers are caught up.
  */
-static bool pr_flush(int timeout_ms, bool reset_on_progress)
+bool pr_flush(int timeout_ms, bool reset_on_progress)
 {
 	return __pr_flush(NULL, timeout_ms, reset_on_progress);
 }
diff --git a/kernel/reboot.c b/kernel/reboot.c
index a701000bab34..3448e6ae3556 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -704,6 +704,7 @@ void kernel_power_off(void)
 	migrate_to_reboot_cpu();
 	syscore_shutdown();
 	pr_emerg("Power down\n");
+	pr_flush(500, 1);
 	kmsg_dump(KMSG_DUMP_SHUTDOWN);
 	machine_power_off();
 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ