[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrNtCWvRK2ASWovm@pathway.suse.cz>
Date: Wed, 7 Aug 2024 14:48:09 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v7 24/35] printk: nbcon: Flush new records on
device_release()
On Wed 2024-08-07 03:21:57, John Ogness wrote:
> On 2024-08-05, Petr Mladek <pmladek@...e.com> wrote:
> >> + /*
> >> + * This context must flush any new records added while the console
> >> + * was locked. The console_srcu_read_lock must be taken to ensure
> >> + * the console is usable throughout flushing.
> >> + */
> >> + cookie = console_srcu_read_lock();
> >> + if (console_is_usable(con, console_srcu_read_flags(con)) &&
> >> + prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
> >> + if (!have_boot_console) {
> >> + __nbcon_atomic_flush_pending_con(con, prb_next_reserve_seq(prb));
> >> + } else if (!is_printk_legacy_deferred()) {
> >> + if (console_trylock())
> >> + console_unlock();
> >
> > nbcon_device_release() is going to be called in uart_port_unlock*()
> > still under the port->lock.
> >
> > => It smells with a potential deadlock. The console_flush_all() in
> > console_unlock() might want to flush this console under the
> > port->lock as well.
> >
> > And it almost happens because nbcon_legacy_emit_next_record()
> > might eventually take con->device_lock() when called in
> > a task context.
> >
> > It will not happen here because this code uses console_trylock()
> > which would set @console_may_schedule to false.
>
> Exactly. That is an important point. We must never try to invoke the
> write_thread() callback while holding a spinlock.
>
> > Anyway, it would look more safe when the flush was done after releasing
> > the port->lock.
>
> Even then we could never invoke the write_thread() callback because the
> caller may be holding other spinlocks. If we cannot safely call
> console_lock(), we cannot take the device lock. The atomic callback must
> be used and that means the port lock cannot be involved in the
> console_trylock().
It makes sense. But it is not obvious.
I actually thought about using con->device_lock() lock around
nbcon_legacy_emit_next_record(). It would help to synchronize
the legacy loop against nbcon_device_lock()/nbcon_device_release().
It won't be needed to call the legacy loop in
nbcon_device_release().
But it was bad idea. There are more reasons to avoid taking
con->device_lock() after console_trylock():
1. con->device_lock() might be a sleeping lock in the future. [*]
It actually already is a sleeping lock in RT. And
console_trylock() might be used in printk() in any context.
2. The less locks we take the more safe printk() will be in
various contexts and situations, especially in panic().
[*] I still have to wire this into my mental model. I keep forgetting it.
It would be nice to have a documentation summarizing the main
ideas and describing the printk() design. I hope that we will
do it one day.
Best Regards,
Petr
Powered by blists - more mailing lists