linux-kernel - Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5b509b1370d42fd0cc109fc8914272be6dcfcd54.camel@gmx.de>
Date: Fri, 22 Aug 2025 05:54:28 +0200
From: Mike Galbraith <efault@....de>
To: Breno Leitao <leitao@...ian.org>
Cc: Pavel Begunkov <asml.silence@...il.com>, Jakub Kicinski
 <kuba@...nel.org>,  Johannes Berg <johannes@...solutions.net>,
 paulmck@...nel.org, LKML <linux-kernel@...r.kernel.org>, 
 netdev@...r.kernel.org, boqun.feng@...il.com
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

On Thu, 2025-08-21 at 10:35 -0700, Breno Leitao wrote:
> > On Thu, Aug 21, 2025 at 05:51:59AM +0200, Mike Galbraith wrote:
 
> > > > --- a/drivers/net/netconsole.c
> > > > +++ b/drivers/net/netconsole.c
> > > > @@ -1952,12 +1952,12 @@ static void netcon_write_thread(struct c
> > > >  static void netconsole_device_lock(struct console *con, unsigned long *flags)
> > > >  {
> > > >  	/* protects all the targets at the same time */
> > > > -	spin_lock_irqsave(&target_list_lock, *flags);
> > > > +	spin_lock(&target_list_lock);
> > 
> > I personally think this target_list_lock can be moved to an RCU lock.
> > 
> > If that is doable, then we probably make netconsole_device_lock()
> > to a simple `rcu_read_lock()`, which would solve this problem as well.

The bigger issue for the nbcon patch would seem to be the seemingly
required .write_atomic leading to landing here with disabled IRQs.

WRT my patch, seeing a hard RT crash on wired box cleanly logged with
your nbcon patch applied (plus my twiddle mentioned earlier) tells me
my patch has lost its original reason to exist.  It's relevant to this
thread only in that those once thought to be RT specific IRQ disable
spots turned out to actually be RT agnostic wireless sore spots.

> > > > --- a/net/core/netpoll.c
> > > > +++ b/net/core/netpoll.c
> > > > @@ -58,6 +58,29 @@ static void zap_completion_queue(void);
> > > >  static unsigned int carrier_timeout = 4;
> > > >  module_param(carrier_timeout, uint, 0644);
> > > >  
> > > > +DEFINE_PER_CPU(int, _netpoll_tx_running);
> > > > +EXPORT_PER_CPU_SYMBOL(_netpoll_tx_running);
> > > > +
> > > > +#define
> > > > netpoll_tx_begin(flags)					\
> > > > +	do
> > > > {							\
> > > > +		if (IS_ENABLED(CONFIG_PREEMPT_RT)
> > > > ||		\
> > > > +		   
> > > > IS_ENABLED(CONFIG_NETCONSOLE_NBCON))	\
> > > > +			local_bh_disable();		
> > > > 	\
> > > > +		else					
> > > > 	\
> > > > +			local_irq_save(flags);		
> > > > 	\
> > > > +		this_cpu_write(_netpoll_tx_running,
> > > > 1);		\
> > > > +	} while (0)
> > 
> > Why can't we just use local_bh_disable() in both cases?

Yeah, believe so.

> > > 
> > > > @@ -246,7 +269,7 @@ static void refill_skbs(struct netpoll *
> > > >  static void zap_completion_queue(void)
> > > >  {
> > > >  	unsigned long flags;
> > > > -	struct softnet_data *sd = &get_cpu_var(softnet_data);
> > > > +	struct softnet_data *sd = this_cpu_ptr(&softnet_data);
> > 
> > How do I check if this is safe to do ?

Too much water under the bridge, I don't recall my path to conclusion
reached, and it seems to no longer matter.

	-Mike