linux-kernel - Re: [PATCH printk v7 24/35] printk: nbcon: Flush new records on device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZrNtCWvRK2ASWovm@pathway.suse.cz>
Date: Wed, 7 Aug 2024 14:48:09 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v7 24/35] printk: nbcon: Flush new records on
 device_release()

On Wed 2024-08-07 03:21:57, John Ogness wrote:
> On 2024-08-05, Petr Mladek <pmladek@...e.com> wrote:
> >> +	/*
> >> +	 * This context must flush any new records added while the console
> >> +	 * was locked. The console_srcu_read_lock must be taken to ensure
> >> +	 * the console is usable throughout flushing.
> >> +	 */
> >> +	cookie = console_srcu_read_lock();
> >> +	if (console_is_usable(con, console_srcu_read_flags(con)) &&
> >> +	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
> >> +		if (!have_boot_console) {
> >> +			__nbcon_atomic_flush_pending_con(con, prb_next_reserve_seq(prb));
> >> +		} else if (!is_printk_legacy_deferred()) {
> >> +			if (console_trylock())
> >> +				console_unlock();
> >
> > nbcon_device_release() is going to be called in uart_port_unlock*()
> > still under the port->lock.
> >
> > => It smells with a potential deadlock. The console_flush_all() in
> >    console_unlock() might want to flush this console under the
> >    port->lock as well.
> >
> >    And it almost happens because nbcon_legacy_emit_next_record()
> >    might eventually take con->device_lock() when called in
> >    a task context.
> >
> >    It will not happen here because this code uses console_trylock()
> >    which would set @console_may_schedule to false.
> 
> Exactly. That is an important point. We must never try to invoke the
> write_thread() callback while holding a spinlock.
> 
> > Anyway, it would look more safe when the flush was done after releasing
> > the port->lock.
> 
> Even then we could never invoke the write_thread() callback because the
> caller may be holding other spinlocks. If we cannot safely call
> console_lock(), we cannot take the device lock. The atomic callback must
> be used and that means the port lock cannot be involved in the
> console_trylock().

It makes sense. But it is not obvious.

I actually thought about using con->device_lock() lock around
nbcon_legacy_emit_next_record(). It would help to synchronize
the legacy loop against nbcon_device_lock()/nbcon_device_release().
It won't be needed to call the legacy loop in
nbcon_device_release().

But it was bad idea. There are more reasons to avoid taking
con->device_lock() after console_trylock():

  1. con->device_lock() might be a sleeping lock in the future. [*]
     It actually already is a sleeping lock in RT. And
     console_trylock() might be used in printk() in any context.

  2. The less locks we take the more safe printk() will be in
     various contexts and situations, especially in panic().


[*] I still have to wire this into my mental model. I keep forgetting it.

    It would be nice to have a documentation summarizing the main
    ideas and describing the printk() design. I hope that we will
    do it one day.

Best Regards,
Petr