[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3d20ce1b-7a9b-4545-a4a9-23822b675e0c@gmail.com>
Date: Fri, 15 Aug 2025 11:44:45 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, Breno Leitao <leitao@...ian.org>
Cc: Mike Galbraith <efault@....de>, paulmck@...nel.org,
LKML <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org,
boqun.feng@...il.com
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning
On 8/15/25 01:23, Jakub Kicinski wrote:
> On Thu, 14 Aug 2025 03:16:11 -0700 Breno Leitao wrote:
>> 2.2) netpoll // net poll will call the network subsystem to send the packet
>> 2.3) lock(&fq->lock); // Try to get the lock while the lock was already held
The report for reference:
https://lore.kernel.org/all/fb38cfe5153fd67f540e6e8aff814c60b7129480.camel@gmx.de/>
> Where does netpoll take fq->lock ?
the dependencies between the lock to be acquired
[ 107.985514] and HARDIRQ-irq-unsafe lock:
[ 107.985531] -> (&fq->lock){+.-.}-{3:3} {
...
[ 107.988053] ... acquired at:
[ 107.988054] check_prev_add+0xfb/0xca0
[ 107.988058] validate_chain+0x48c/0x530
[ 107.988061] __lock_acquire+0x550/0xbc0
[ 107.988064] lock_acquire.part.0+0xa1/0x210
[ 107.988068] _raw_spin_lock_bh+0x38/0x50
[ 107.988070] ieee80211_queue_skb+0xfd/0x350 [mac80211]
[ 107.988198] __ieee80211_xmit_fast+0x202/0x360 [mac80211]
[ 107.988314] ieee80211_xmit_fast+0xfb/0x1f0 [mac80211]
[ 107.988424] __ieee80211_subif_start_xmit+0x14e/0x3d0 [mac80211]
[ 107.988530] ieee80211_subif_start_xmit+0x46/0x230 [mac80211]
[ 107.988634] netpoll_start_xmit+0x8b/0xd0
[ 107.988638] __netpoll_send_skb+0x329/0x3b0
[ 107.988641] write_msg+0x104/0x120 [netconsole]
[ 107.988647] console_emit_next_record+0x203/0x250
[ 107.988652] console_flush_all+0x24d/0x370
[ 107.988657] console_unlock+0x66/0x130
[ 107.988662] vprintk_emit+0x142/0x360
[ 107.988666] _printk+0x5b/0x80
[ 107.988671] enabled_store.cold+0x7e/0x83 [netconsole]
[ 107.988677] configfs_write_iter+0xbd/0x120 [configfs]
[ 107.988683] vfs_write+0x213/0x520
[ 107.988689] ksys_write+0x69/0xe0
[ 107.988691] do_syscall_64+0x94/0xa10
[ 107.988695] entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
> We started hitting this a lot in the CI as well, lockdep must have
> gotten more sensitive in 6.17. Last I checked lockdep didn't understand
FWIW, I remember there were similar reports last year but with
xmit lock.
> that we manually test for nesting with netif_local_xmit_active().
Looks like Breno tried to simplify it, the original syz report
gave the following scenario:
[ 107.984942] Chain exists of:
console_owner --> target_list_lock --> &fq->lock
[ 107.984947] Possible interrupt unsafe locking scenario:
[ 107.984948] CPU0 CPU1
[ 107.984949] ---- ----
[ 107.984950] lock(&fq->lock);
[ 107.984952] local_irq_disable();
[ 107.984952] lock(console_owner);
[ 107.984954] lock(target_list_lock);
[ 107.984956] <Interrupt>
[ 107.984957] lock(console_owner);
Seems like with the fq->lock trace I pasted above we can get sth like:
CPU0 CPU1
---- ----
lock(&fq->lock);
local_irq_disable();
lock(console_owner);
lock(target_list_lock);
lock(&fq->lock);
<Interrupt>
lock(console_owner);
Nesting checks won't help with this one.
--
Pavel Begunkov
Powered by blists - more mailing lists