lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2f22c98bee6c549205efed3cb03b82805cb54977.camel@sipsolutions.net>
Date: Tue, 26 Aug 2025 16:10:20 +0200
From: Johannes Berg <johannes@...solutions.net>
To: Jakub Kicinski <kuba@...nel.org>, Pavel Begunkov <asml.silence@...il.com>
Cc: Breno Leitao <leitao@...ian.org>, Mike Galbraith <efault@....de>, 
	paulmck@...nel.org, LKML <linux-kernel@...r.kernel.org>,
 netdev@...r.kernel.org, 	boqun.feng@...il.com
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

On Fri, 2025-08-15 at 09:42 -0700, Jakub Kicinski wrote:
> On Fri, 15 Aug 2025 11:44:45 +0100 Pavel Begunkov wrote:
> > On 8/15/25 01:23, Jakub Kicinski wrote:
> > > On Thu, 14 Aug 2025 03:16:11 -0700 Breno Leitao wrote:  
> > > >   2.2) netpoll 				// net poll will call the network subsystem to send the packet
> > > >   2.3) lock(&fq->lock);			// Try to get the lock while the lock was already held  
> > 
> > The report for reference:
> > 
> > https://lore.kernel.org/all/fb38cfe5153fd67f540e6e8aff814c60b7129480.camel@gmx.de/> 
> > > Where does netpoll take fq->lock ?  
> > 
> > the dependencies between the lock to be acquired
> > [  107.985514]  and HARDIRQ-irq-unsafe lock:
> > [  107.985531] -> (&fq->lock){+.-.}-{3:3} {
> > ...
> > [  107.988053]  ... acquired at:
> > [  107.988054]    check_prev_add+0xfb/0xca0
> > [  107.988058]    validate_chain+0x48c/0x530
> > [  107.988061]    __lock_acquire+0x550/0xbc0
> > [  107.988064]    lock_acquire.part.0+0xa1/0x210
> > [  107.988068]    _raw_spin_lock_bh+0x38/0x50
> > [  107.988070]    ieee80211_queue_skb+0xfd/0x350 [mac80211]
> > [  107.988198]    __ieee80211_xmit_fast+0x202/0x360 [mac80211]
> > [  107.988314]    ieee80211_xmit_fast+0xfb/0x1f0 [mac80211]
> > [  107.988424]    __ieee80211_subif_start_xmit+0x14e/0x3d0 [mac80211]
> > [  107.988530]    ieee80211_subif_start_xmit+0x46/0x230 [mac80211]
> 
> Ah, that's WiFi's stack queuing. Dunno whether we expect netpoll to 
> work over WiFi. I suspect disabling netconsole over WiFi may be the 
> most sensible way out. Johannes, do you expect mac80211 Tx to be IRQ-safe?

I see there's a long thread beyond this, but I just got back from
vacation and haven't read all of it.

As for this question itself, I'd say no. In some cases it probably could
be made safe for mac80211 _itself_ (by adjust that lock and maybe
another one or two), but that wouldn't extend to the drivers, so it'd be
up to the individual drivers. In most cases mac80211 calls wake_tx_queue
(either driver or its own implementation) and that will pull frame(s),
but either way it's going to go all the way into the driver, with
unknown results.

I guess we could do that async since we queue there anyway, but in this
case (of wanting to get things out of a dying system) that'd probably be
counter-productive...

Maybe if it's an individual driver opt-in, but I don't really see it
working for most drivers.

johannes

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ