[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6bc3af45969936668200657112140b78615e3873.camel@gmx.de>
Date: Tue, 26 Aug 2025 15:56:06 +0200
From: Mike Galbraith <efault@....de>
To: Breno Leitao <leitao@...ian.org>, Simon Horman <horms@...nel.org>,
kuba@...nel.org, calvin@...nvd.org
Cc: Pavel Begunkov <asml.silence@...il.com>, Johannes Berg
<johannes@...solutions.net>, paulmck@...nel.org, LKML
<linux-kernel@...r.kernel.org>, netdev@...r.kernel.org, boqun.feng@...il.com
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning
On Tue, 2025-08-26 at 05:43 -0700, Breno Leitao wrote:
> On Fri, Aug 22, 2025 at 05:54:28AM +0200, Mike Galbraith wrote:
> > On Thu, 2025-08-21 at 10:35 -0700, Breno Leitao wrote:
> > > > On Thu, Aug 21, 2025 at 05:51:59AM +0200, Mike Galbraith wrote:
> >
> > > > > > --- a/drivers/net/netconsole.c
> > > > > > +++ b/drivers/net/netconsole.c
> > > > > > @@ -1952,12 +1952,12 @@ static void netcon_write_thread(struct c
> > > > > > static void netconsole_device_lock(struct console *con, unsigned long *flags)
> > > > > > {
> > > > > > /* protects all the targets at the same time */
> > > > > > - spin_lock_irqsave(&target_list_lock, *flags);
> > > > > > + spin_lock(&target_list_lock);
> > > >
> > > > I personally think this target_list_lock can be moved to an RCU lock.
> > > >
> > > > If that is doable, then we probably make netconsole_device_lock()
> > > > to a simple `rcu_read_lock()`, which would solve this problem as well.
> >
> > The bigger issue for the nbcon patch would seem to be the seemingly
> > required .write_atomic leading to landing here with disabled IRQs.
>
> In this case, instead of transmitting through netpoll directly in the
> .write_atomic context, we could queue the messages for later delivery.
>
> With the current implementation, this is not straightforward unless we
> introduce an additional message copy at the start of .write_atomic.
>
> This is where the interface between netpoll and netconsole becomes
> problematic. Ideally, we would avoid carrying extra data into netconsole
> and instead copy the message into an SKB and queue the SKB for
> transmission.
>
> The core issue is that netpoll and netconsole are tightly coupled, and
> several pieces of functionality that live in netpoll really belong in
> netconsole. A good example is the SKB pool: that’s a netconsole concept,
> not a netpoll one. None of the other netpoll users send raw char *
> messages. They all work directly with skbs, so, in order to achieve it,
> we need to move the concept of skb pool into netconsole, and give
> netconsole the management of the skb pool.
>
> > WRT my patch, seeing a hard RT crash on wired box cleanly logged with
> > your nbcon patch applied (plus my twiddle mentioned earlier) tells me
> > my patch has lost its original reason to exist. It's relevant to this
> > thread only in that those once thought to be RT specific IRQ disable
> > spots turned out to actually be RT agnostic wireless sore spots.
>
> Thanks. As a follow-up, I would suggest the following steps:
>
> 1) Decouple the SKB pool from netpoll and move it into netconsole
>
> * This makes netconsole behave like any other netpoll user,
> interacting with netpoll by sending SKBs.
> * The SKB population logic would then reside in netconsole, where it
> logically belongs.
>
> * Enable NBCONS in netconsole, guarded by NETCONSOLE_NBCON
> * In normal .write_atomic() mode, messages should be queued in
> a workqueue.
> * If oops_in_progress is set, we bypass the queue and
> transmit the SKB immediately. (Maybe disabling lockdep?!).
>
> Any concern with this plan?
Nope, sounds like progress to me.
>From an RT perspective, the further netconsole gets from the netpoll's
definition of acceptable IRQ holdoff, the better. For a death rattle,
it doesn't matter, but for mundane kernel squeaks that's a fail.
-Mike
Powered by blists - more mailing lists