linux-kernel - Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace and dump_stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r0brkvqd.fsf@jogness.linutronix.de>
Date: Thu, 18 Jul 2024 09:31:14 +0206
From: John Ogness <john.ogness@...utronix.de>
To: Rik van Riel <riel@...riel.com>, Andrew Morton <akpm@...ux-foundation.org>
Cc: Omar Sandoval <osandov@...a.com>, linux-kernel@...r.kernel.org, Petr
 Mladek <pmladek@...e.com>, Steven Rostedt <rostedt@...dmis.org>, Sergey
 Senozhatsky <senozhatsky@...omium.org>, kernel-team <kernel-team@...a.com>
Subject: Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace
 and dump_stack_lvl

On 2024-07-17, Rik van Riel <riel@...riel.com> wrote:
> I think that would do the trick. The nmi_backtrace() printk is already
> deferred, because of the check for in_nmi() in vprintk(), and this
> change would put all the other users of printk_cpu_sync_get_irqsave()
> on the exact same footing as nmi_backtrace().
>
> Combing through the code a little, it looks like that would remove
> the potential for this deadlock to happen again.

Let's see what Petr has to say. (He'll be back on Monday.) He might
prefer a solution that does not result in deferring printing for all
cases. i.e. allow the console_lock if it is available, but avoid the
spinning if it is not. Below is a patch that would achieve this.

John

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index dddb15f48d59..36f40db0bf93 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1060,6 +1060,8 @@ static int __init log_buf_len_setup(char *str)
 early_param("log_buf_len", log_buf_len_setup);
 
 #ifdef CONFIG_SMP
+static bool vprintk_emit_may_spin(void);
+
 #define __LOG_CPU_MAX_BUF_LEN (1 << CONFIG_LOG_CPU_MAX_BUF_SHIFT)
 
 static void __init log_buf_add_cpu(void)
@@ -1090,6 +1092,7 @@ static void __init log_buf_add_cpu(void)
 }
 #else /* !CONFIG_SMP */
 static inline void log_buf_add_cpu(void) {}
+static inline bool vprintk_emit_may_spin(void) { return true };
 #endif /* CONFIG_SMP */
 
 static void __init set_percpu_data_ready(void)
@@ -2330,6 +2333,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
+		int ret;
+
 		/*
 		 * The caller may be holding system-critical or
 		 * timing-sensitive locks. Disable preemption during
@@ -2344,7 +2349,11 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * spinning variant, this context tries to take over the
 		 * printing from another printing context.
 		 */
-		if (console_trylock_spinning())
+		if (vprintk_emit_may_spin())
+			ret = console_trylock_spinning();
+		else
+			ret = console_trylock();
+		if (ret)
 			console_unlock();
 		preempt_enable();
 	}
@@ -4321,6 +4330,15 @@ void console_replay_all(void)
 static atomic_t printk_cpu_sync_owner = ATOMIC_INIT(-1);
 static atomic_t printk_cpu_sync_nested = ATOMIC_INIT(0);
 
+/*
+ * As documented in printk_cpu_sync_get_irqsave(), a context holding the
+ * printk_cpu_sync must not spin waiting for another CPU.
+ */
+static bool vprintk_emit_may_spin(void)
+{
+	return (atomic_read(&printk_cpu_sync_owner) != smp_processor_id());
+}
+
 /**
  * __printk_cpu_sync_wait() - Busy wait until the printk cpu-reentrant
  *                            spinning lock is not owned by any CPU.