netdev - Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cbvfyefqdyy6py2fswqp3licm3ynrsmc3jclgnbubp72elmai7@kwvks5yhkybc>
Date: Wed, 10 Sep 2025 11:23:08 -0700
From: Breno Leitao <leitao@...ian.org>
To: John Ogness <john.ogness@...utronix.de>
Cc: Mike Galbraith <efault@....de>, Simon Horman <horms@...nel.org>, 
	kuba@...nel.org, calvin@...nvd.org, Pavel Begunkov <asml.silence@...il.com>, 
	Johannes Berg <johannes@...solutions.net>, paulmck@...nel.org, LKML <linux-kernel@...r.kernel.org>, 
	netdev@...r.kernel.org, boqun.feng@...il.com, Petr Mladek <pmladek@...e.com>, 
	Sergey Senozhatsky <senozhatsky@...omium.org>, Steven Rostedt <rostedt@...dmis.org>
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

On Wed, Sep 10, 2025 at 02:28:40PM +0206, John Ogness wrote:
> On 2025-09-09, Breno Leitao <leitao@...ian.org> wrote:
> >   b) Send the message anyway (and hope for the best)
> >     Cons: Netpoll will continue to call IRQ unsafe locks from IRQ safe
> >           context (lockdep will continue to be unhappy)
> >     Pro: This is how it works today already, so, it is not making the problem worse.
> >          In fact, it is narrowing the problem to only .write_atomic().
> 
> Two concerns here:
> 
> 1. ->write_atomic() is also used during normal operation
> 
> 2. It is expected that ->write_atomic() callbacks are implemented
>    safely. The other nbcon citizens are doing this. Having an nbcon
>    driver with an unsafe ->write_atomic() puts all nbcon drivers at risk
>    of not functioning during panic.
> 
> This could be combined with (a) so that ->write_atomic() implements its
> own deferred queue of messages to print and only when
> @legacy_allow_panic_sync is true, will it try to send immediately and
> hope for the best. @legacy_allow_panic_sync is set after all nbcon
> drivers have had a chance to flush their buffers safely and then the
> kernel starts to allow less safe drivers to flush.
> 
> Although I would prefer the NBCON_ATOMIC_UNSAFE approach instead.

Agree. That seems a more straight forward solution for drivers, and it
is clearly a solution that would help netconsole case.

> >   c) Not implementing .write_atomic
> >     Cons: we lose the most important messages of the boot.
> >
> >   Any other option I am not seeing?
> 
> d) Not implementing ->write_atomic() and instead implement a kmsg_dumper
>    for netconsole. This registers a callback that is called during
>    panic.
> 
>    Con: The kmsg_dumper interface has nothing to do with consoles, so it
>         would require some effort coordinating with the console drivers.

I am looking at kmsg_dumper interface, and it doesn't have the buffers
that need to be dumper.

So, if I understand corect, my kmsg_dumper callback needs to handle loop
into the messages buffer and print the remaining messages, right?

In other words, do I need to track what messages were sent in
netconsole, and then iterate in the kmsgs buffer 
to find messages that hasn't been sent, and send from there?

>    Pro: There is absolute freedom for the dumper to implement its own
>         panic-only solution to get messages out.

What about calls to .write_atomic() calls that are not called during
panic? Will those be lost in this approach?

> e) Involve support from the underlying network drivers to implement true
>    atomic sending. Thomas Gleixner talked [0] very briefly about how
>    this could be implemented for netconsole during the 2022
>    proof-of-concept presentation of the nbcon API.
> 
>    Cons: It most likely requires new API callbacks for the network
>          drivers to implement hardware-specific solutions. Many (most?)
>          drivers would not be able to support it.
> 
>    Pro: True reliable atomic printing via network.

That would make more sense, but, it seems deciding about it is above my
pay grade. :-)

Thanks for helping us with this issue,
--breno