netdev - Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6bc3af45969936668200657112140b78615e3873.camel@gmx.de>
Date: Tue, 26 Aug 2025 15:56:06 +0200
From: Mike Galbraith <efault@....de>
To: Breno Leitao <leitao@...ian.org>, Simon Horman <horms@...nel.org>, 
	kuba@...nel.org, calvin@...nvd.org
Cc: Pavel Begunkov <asml.silence@...il.com>, Johannes Berg
 <johannes@...solutions.net>, paulmck@...nel.org, LKML
 <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org, boqun.feng@...il.com
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

On Tue, 2025-08-26 at 05:43 -0700, Breno Leitao wrote:
> On Fri, Aug 22, 2025 at 05:54:28AM +0200, Mike Galbraith wrote:
> > On Thu, 2025-08-21 at 10:35 -0700, Breno Leitao wrote:
> > > > On Thu, Aug 21, 2025 at 05:51:59AM +0200, Mike Galbraith wrote:
> >  
> > > > > > --- a/drivers/net/netconsole.c
> > > > > > +++ b/drivers/net/netconsole.c
> > > > > > @@ -1952,12 +1952,12 @@ static void netcon_write_thread(struct c
> > > > > >  static void netconsole_device_lock(struct console *con, unsigned long *flags)
> > > > > >  {
> > > > > >  	/* protects all the targets at the same time */
> > > > > > -	spin_lock_irqsave(&target_list_lock, *flags);
> > > > > > +	spin_lock(&target_list_lock);
> > > > 
> > > > I personally think this target_list_lock can be moved to an RCU lock.
> > > > 
> > > > If that is doable, then we probably make netconsole_device_lock()
> > > > to a simple `rcu_read_lock()`, which would solve this problem as well.
> > 
> > The bigger issue for the nbcon patch would seem to be the seemingly
> > required .write_atomic leading to landing here with disabled IRQs.
> 
> In this case, instead of transmitting through netpoll directly in the
> .write_atomic context, we could queue the messages for later delivery.
> 
> With the current implementation, this is not straightforward unless we
> introduce an additional message copy at the start of .write_atomic.
> 
> This is where the interface between netpoll and netconsole becomes
> problematic. Ideally, we would avoid carrying extra data into netconsole
> and instead copy the message into an SKB and queue the SKB for
> transmission.
> 
> The core issue is that netpoll and netconsole are tightly coupled, and
> several pieces of functionality that live in netpoll really belong in
> netconsole. A good example is the SKB pool: that’s a netconsole concept,
> not a netpoll one. None of the other netpoll users send raw char *
> messages. They all work directly with skbs, so, in order to achieve it,
> we need to move the concept of skb pool into netconsole, and give
> netconsole the management of the skb pool.
> 
> > WRT my patch, seeing a hard RT crash on wired box cleanly logged with
> > your nbcon patch applied (plus my twiddle mentioned earlier) tells me
> > my patch has lost its original reason to exist.  It's relevant to this
> > thread only in that those once thought to be RT specific IRQ disable
> > spots turned out to actually be RT agnostic wireless sore spots.
> 
> Thanks. As a follow-up, I would suggest the following steps:
> 
> 1) Decouple the SKB pool from netpoll and move it into netconsole
> 
>   * This makes netconsole behave like any other netpoll user,
>     interacting with netpoll by sending SKBs.
> 	* The SKB population logic would then reside in netconsole, where it
> 	  logically belongs.
> 
>   * Enable NBCONS in netconsole, guarded by NETCONSOLE_NBCON
> 	* In normal .write_atomic() mode, messages should be queued in
> 	  a workqueue.
> 	* If oops_in_progress is set, we bypass the queue and
> 	  transmit the SKB immediately. (Maybe disabling lockdep?!).
> 
> Any concern with this plan?

Nope, sounds like progress to me.

>From an RT perspective, the further netconsole gets from the netpoll's
definition of acceptable IRQ holdoff, the better.  For a death rattle,
it doesn't matter, but for mundane kernel squeaks that's a fail.

	-Mike