lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <84h68vnr90.fsf@jogness.linutronix.de>
Date: Tue, 29 Oct 2024 11:28:03 +0106
From: John Ogness <john.ogness@...utronix.de>
To: Boqun Feng <boqun.feng@...il.com>, John Stultz <jstultz@...gle.com>
Cc: Petr Mladek <pmladek@...e.com>, Greg Kroah-Hartman
 <gregkh@...uxfoundation.org>, jirislaby@...nel.org, LKML
 <linux-kernel@...r.kernel.org>, kernel-team@...roid.com
Subject: Re: Deadlock?: console_waiter/serial8250_ports/low_water_lock with
 6.12-rc

On 2024-10-28, Boqun Feng <boqun.feng@...il.com> wrote:
> I think the cause of the issue is:
>
> 	CPU X					CPU Y
> 	=====					=====
> 	uart_write():				console_unlock(): // console lock is held by Y.
> 	  uart_port_lock();			  __console_flush_and_unlock():
> 	  __uart_start():			    __console_flush_all():
> 	    pm_runtime_get():			      console_emit_next_record():
> 	      __pm_runtime_resume():		        con->write(); <- serial8250_console_write() // will try to acquire uart_port_lock();
> 	        spin_lock_irqsave(&dev->power.lock, flags):
> 		  <this triggers the lockdep splats, probably because
> 		   PM has done some print under "&dev->power.lock">
> 		  lock_acquire():
> 		    printk():

It is a known problem that calling printk() while holding the
uart_port_lock for non-printing purposes (such as pm) will deadlock the
system. You don't even need CPU-Y to be involved. CPU-X will deadlock
itself after acquiring the console_lock.

One possible solution would be to enable deferred_printk if the
uart_port_lock of a console is taken for non-printing purposes. The
correct solution is to convert the console driver to the new nbcon
model.

The reasons why nbcon avoids this issue:

1. It does not use the BKL-like console lock.

2. It is aware that something else is using the driver and will instead
just write to the lockless ringbuffer rather than endlessly spinning on
the lock (that it itself is already holding).

@jstultz: Is it possible that you could run your tests using the latest
version [0] of the proposed nbcon-based 8250 driver? This will not have
the issue and should cleanly apply to any recent kernel.

John Ogness

[0] https://lore.kernel.org/lkml/20241025105728.602310-1-john.ogness@linutronix.de

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ