linux-kernel - Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace and dump_stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZqD54dcZUAirxTYg@pathway.suse.cz>
Date: Wed, 24 Jul 2024 14:56:01 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Rik van Riel <riel@...riel.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Omar Sandoval <osandov@...a.com>, linux-kernel@...r.kernel.org,
	Steven Rostedt <rostedt@...dmis.org>,
	Sergey Senozhatsky <senozhatsky@...omium.org>,
	kernel-team <kernel-team@...a.com>
Subject: Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace
 and dump_stack_lvl

On Thu 2024-07-18 16:15:43, John Ogness wrote:
> On 2024-07-18, Rik van Riel <riel@...riel.com> wrote:
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index dddb15f48d59..36f40db0bf93 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -1060,6 +1060,8 @@ static int __init log_buf_len_setup(char *str)
> >>  early_param("log_buf_len", log_buf_len_setup);
> >>  
> >>  #ifdef CONFIG_SMP
> >> +static bool vprintk_emit_may_spin(void);
> >> +
> >>  #define __LOG_CPU_MAX_BUF_LEN (1 << CONFIG_LOG_CPU_MAX_BUF_SHIFT)
> >>  
> >>  static void __init log_buf_add_cpu(void)
> >> @@ -1090,6 +1092,7 @@ static void __init log_buf_add_cpu(void)
> >>  }
> >>  #else /* !CONFIG_SMP */
> >>  static inline void log_buf_add_cpu(void) {}
> >> +static inline bool vprintk_emit_may_spin(void) { return true };
> >>  #endif /* CONFIG_SMP */
> >>  
> >>  static void __init set_percpu_data_ready(void)
> >> @@ -2330,6 +2333,8 @@ asmlinkage int vprintk_emit(int facility, int
> >> level,
> >>  
> >>  	/* If called from the scheduler, we can not call up(). */
> >>  	if (!in_sched) {
> >> +		int ret;
> >> +
> >>  		/*
> >>  		 * The caller may be holding system-critical or
> >>  		 * timing-sensitive locks. Disable preemption during
> >> @@ -2344,7 +2349,11 @@ asmlinkage int vprintk_emit(int facility, int
> >> level,
> >>  		 * spinning variant, this context tries to take over
> >> the
> >>  		 * printing from another printing context.
> >>  		 */
> >> -		if (console_trylock_spinning())
> >> +		if (vprintk_emit_may_spin())

I would either open code the check or change the function to
is_printk_cpu_sync_owner().

> >> +			ret = console_trylock_spinning();
> >> +		else
> >> +			ret = console_trylock();
> >> +		if (ret)
> >>  			console_unlock();
> >>  		preempt_enable();
> >>  	}
> >> @@ -4321,6 +4330,15 @@ void console_replay_all(void)
> >>  static atomic_t printk_cpu_sync_owner = ATOMIC_INIT(-1);
> >>  static atomic_t printk_cpu_sync_nested = ATOMIC_INIT(0);
> >>  
> >> +/*
> >> + * As documented in printk_cpu_sync_get_irqsave(), a context holding
> >> the
> >> + * printk_cpu_sync must not spin waiting for another CPU.
> >> + */
> >> +static bool vprintk_emit_may_spin(void)
> >> +{
> >> +	return (atomic_read(&printk_cpu_sync_owner) !=
> >> smp_processor_id());
> >> +}
> >
> > I think what the code would have to do is only trylock, and never
> > spin after taking the printk_cpu_sync_get_irqsave lock.
> 
> That is what the code does. If @printk_cpu_sync_owner is set to the
> current CPU, the context is holding the cpu_sync and will call the
> non-spinning variant, console_trylock().
> 
> However, my first suggestion to defer whenever the cpu_sync is held
> really is the only option because console_unlock() will spin on the uart
> port lock, and that is also not allowed when holding the cpu_sync.

It would have helped if Rick added backtraces from the crash dumps.
He just wrote:

> > CPU A: printk_cpu_sync_get lock -> console_lock
> > CPU B: console_lock -> (nmi) printk_cpu_sync_get lock

My quess is that it looked like:

CPU A				CPU B

				printk()
				  console_try_lock_spinning()
				  console_unlock()
				    console_emit_next_record()
				      console_lock_spinning_enable();
					con->write()
					  spin_lock(port->lock);

printk_cpu_sync_get()
  printk()
    console_try_lock_spinning()
      # spinning and wating for CPU B

				NMI:

				  printk_cpu_sync_get()
				    # waiting for CPU A

=> DEADLOCK


The deadlock is caused under/by printk_cpu_sync_get() but only because
console_try_lock_spinning() is blocked. It is not a true "try_lock"
operation which should never get blocked.

=> The above patch should solve the problem as well. It will cause
   that console_try_lock_spinning() would fail immediately on CPU A.

Note that port->lock can't cause any deadlock in this scenario.
console_try_lock_spinning() will always fail on CPU A until
the NMI gets handled on CPU B.

By other words, printk_cpu_sync_get() will behave as a tail lock
on CPU A because of the failing trylock.

Best Regards,
Petr