linux-kernel - [PATCH 1/3] printk/nbcon: Block printk kthreads when any CPU is in an emergency context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250926124912.243464-2-pmladek@suse.com>
Date: Fri, 26 Sep 2025 14:49:10 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jiri Slaby <jirislaby@...nel.org>,
	Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Esben Haabendal <esben@...nix.com>,
	linux-serial@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
	Arnd Bergmann <arnd@...db.de>,
	Tony Lindgren <tony@...mide.com>,
	Niklas Schnelle <schnelle@...ux.ibm.com>,
	Serge Semin <fancer.lancer@...il.com>,
	Andrew Murray <amurray@...goodpenguin.co.uk>,
	Petr Mladek <pmladek@...e.com>
Subject: [PATCH 1/3] printk/nbcon: Block printk kthreads when any CPU is in an emergency context

In emergency contexts, printk() tries to flush messages directly even
on nbcon consoles. And it is allowed to takeover the console ownership
and interrupt the printk kthread in the middle of a message.

Only one takeover and one repeated message should be enough in most
situations. The first emergency message flushes the backlog and printk
kthreads get to sleep. Next emergency messages are flushed directly
and printk() does not wake up the kthreads.

However, the one takeover is not guaranteed. Any printk() in normal
context on another CPU could wake up the kthreads. Or a new emergency
message might be added before the kthreads get to sleep. Note that
the interrupted .write_kthread() callbacks usually have to call
nbcon_reacquire_nobuf() and restore the original device setting
before checking for pending messages.

The risk of the repeated takeovers will be even bigger because
__nbcon_atomic_flush_pending_con is going to release the console
ownership after each emitted record. It will be needed to prevent
hardlockup reports on other CPUs which are busy waiting for
the context ownership, for example, by nbcon_reacquire_nobuf() or
__uart_port_nbcon_acquire().

The repeated takeovers break the output, for example:

    [ 5042.650211][ T2220] Call Trace:
    [ 5042.6511
    ** replaying previous printk message **
    [ 5042.651192][ T2220]  <TASK>
    [ 5042.652160][ T2220]  kunit_run_
    ** replaying previous printk message **
    [ 5042.652160][ T2220]  kunit_run_tests+0x72/0x90
    [ 5042.653340][ T22
    ** replaying previous printk message **
    [ 5042.653340][ T2220]  ? srso_alias_return_thunk+0x5/0xfbef5
    [ 5042.654628][ T2220]  ? stack_trace_save+0x4d/0x70
    [ 5042.6553
    ** replaying previous printk message **
    [ 5042.655394][ T2220]  ? srso_alias_return_thunk+0x5/0xfbef5
    [ 5042.656713][ T2220]  ? save_trace+0x5b/0x180

A more robust solution is to block the printk kthread entirely whenever
*any* CPU enters an emergency context. This ensures that critical messages
can be flushed without contention from the normal, non-atomic printing
path.

Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz
Signed-off-by: Petr Mladek <pmladek@...e.com>
---
 kernel/printk/nbcon.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index d5d8c8c657e0..08b196e898cd 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -117,6 +117,9 @@
  * from scratch.
  */
 
+/* Counter of active nbcon emergency contexts. */
+atomic_t nbcon_cpu_emergency_cnt;
+
 /**
  * nbcon_state_set - Helper function to set the console state
  * @con:	Console to update
@@ -1168,6 +1171,16 @@ static bool nbcon_kthread_should_wakeup(struct console *con, struct nbcon_contex
 	if (kthread_should_stop())
 		return true;
 
+	/*
+	 * Block the kthread when the system is in an emergency or panic mode.
+	 * It increases the chance that these contexts would be able to show
+	 * the messages directly. And it reduces the risk of interrupted writes
+	 * where the context with a higher priority takes over the nbcon console
+	 * ownership in the middle of a message.
+	 */
+	if (unlikely(atomic_read(&nbcon_cpu_emergency_cnt)))
+		return false;
+
 	cookie = console_srcu_read_lock();
 
 	flags = console_srcu_read_flags(con);
@@ -1219,6 +1232,13 @@ static int nbcon_kthread_func(void *__console)
 		if (kthread_should_stop())
 			return 0;
 
+		/*
+		 * Block the kthread when the system is in an emergency or panic
+		 * mode. See nbcon_kthread_should_wakeup() for more details.
+		 */
+		if (unlikely(atomic_read(&nbcon_cpu_emergency_cnt)))
+			goto wait_for_event;
+
 		backlog = false;
 
 		/*
@@ -1660,6 +1680,8 @@ void nbcon_cpu_emergency_enter(void)
 
 	preempt_disable();
 
+	atomic_inc(&nbcon_cpu_emergency_cnt);
+
 	cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
 	(*cpu_emergency_nesting)++;
 }
@@ -1674,10 +1696,18 @@ void nbcon_cpu_emergency_exit(void)
 	unsigned int *cpu_emergency_nesting;
 
 	cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
-
 	if (!WARN_ON_ONCE(*cpu_emergency_nesting == 0))
 		(*cpu_emergency_nesting)--;
 
+	/*
+	 * Wake up kthreads because there might be some pending messages
+	 * added by other CPUs with normal priority since the last flush
+	 * in the emergency context.
+	 */
+	if (!WARN_ON_ONCE(atomic_read(&nbcon_cpu_emergency_cnt) == 0))
+		if (atomic_dec_return(&nbcon_cpu_emergency_cnt) == 0)
+			nbcon_kthreads_wake();
+
 	preempt_enable();
 }
 
-- 
2.51.0