[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbvfyefqdyy6py2fswqp3licm3ynrsmc3jclgnbubp72elmai7@kwvks5yhkybc>
Date: Wed, 10 Sep 2025 11:23:08 -0700
From: Breno Leitao <leitao@...ian.org>
To: John Ogness <john.ogness@...utronix.de>
Cc: Mike Galbraith <efault@....de>, Simon Horman <horms@...nel.org>,
kuba@...nel.org, calvin@...nvd.org, Pavel Begunkov <asml.silence@...il.com>,
Johannes Berg <johannes@...solutions.net>, paulmck@...nel.org, LKML <linux-kernel@...r.kernel.org>,
netdev@...r.kernel.org, boqun.feng@...il.com, Petr Mladek <pmladek@...e.com>,
Sergey Senozhatsky <senozhatsky@...omium.org>, Steven Rostedt <rostedt@...dmis.org>
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning
On Wed, Sep 10, 2025 at 02:28:40PM +0206, John Ogness wrote:
> On 2025-09-09, Breno Leitao <leitao@...ian.org> wrote:
> > b) Send the message anyway (and hope for the best)
> > Cons: Netpoll will continue to call IRQ unsafe locks from IRQ safe
> > context (lockdep will continue to be unhappy)
> > Pro: This is how it works today already, so, it is not making the problem worse.
> > In fact, it is narrowing the problem to only .write_atomic().
>
> Two concerns here:
>
> 1. ->write_atomic() is also used during normal operation
>
> 2. It is expected that ->write_atomic() callbacks are implemented
> safely. The other nbcon citizens are doing this. Having an nbcon
> driver with an unsafe ->write_atomic() puts all nbcon drivers at risk
> of not functioning during panic.
>
> This could be combined with (a) so that ->write_atomic() implements its
> own deferred queue of messages to print and only when
> @legacy_allow_panic_sync is true, will it try to send immediately and
> hope for the best. @legacy_allow_panic_sync is set after all nbcon
> drivers have had a chance to flush their buffers safely and then the
> kernel starts to allow less safe drivers to flush.
>
> Although I would prefer the NBCON_ATOMIC_UNSAFE approach instead.
Agree. That seems a more straight forward solution for drivers, and it
is clearly a solution that would help netconsole case.
> > c) Not implementing .write_atomic
> > Cons: we lose the most important messages of the boot.
> >
> > Any other option I am not seeing?
>
> d) Not implementing ->write_atomic() and instead implement a kmsg_dumper
> for netconsole. This registers a callback that is called during
> panic.
>
> Con: The kmsg_dumper interface has nothing to do with consoles, so it
> would require some effort coordinating with the console drivers.
I am looking at kmsg_dumper interface, and it doesn't have the buffers
that need to be dumper.
So, if I understand corect, my kmsg_dumper callback needs to handle loop
into the messages buffer and print the remaining messages, right?
In other words, do I need to track what messages were sent in
netconsole, and then iterate in the kmsgs buffer
to find messages that hasn't been sent, and send from there?
> Pro: There is absolute freedom for the dumper to implement its own
> panic-only solution to get messages out.
What about calls to .write_atomic() calls that are not called during
panic? Will those be lost in this approach?
> e) Involve support from the underlying network drivers to implement true
> atomic sending. Thomas Gleixner talked [0] very briefly about how
> this could be implemented for netconsole during the 2022
> proof-of-concept presentation of the nbcon API.
>
> Cons: It most likely requires new API callbacks for the network
> drivers to implement hardware-specific solutions. Many (most?)
> drivers would not be able to support it.
>
> Pro: True reliable atomic printing via network.
That would make more sense, but, it seems deciding about it is above my
pay grade. :-)
Thanks for helping us with this issue,
--breno
Powered by blists - more mailing lists