netdev - Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3d20ce1b-7a9b-4545-a4a9-23822b675e0c@gmail.com>
Date: Fri, 15 Aug 2025 11:44:45 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, Breno Leitao <leitao@...ian.org>
Cc: Mike Galbraith <efault@....de>, paulmck@...nel.org,
 LKML <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org,
 boqun.feng@...il.com
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

On 8/15/25 01:23, Jakub Kicinski wrote:
> On Thu, 14 Aug 2025 03:16:11 -0700 Breno Leitao wrote:
>>   2.2) netpoll 				// net poll will call the network subsystem to send the packet
>>   2.3) lock(&fq->lock);			// Try to get the lock while the lock was already held

The report for reference:

https://lore.kernel.org/all/fb38cfe5153fd67f540e6e8aff814c60b7129480.camel@gmx.de/> 
> Where does netpoll take fq->lock ?

the dependencies between the lock to be acquired
[  107.985514]  and HARDIRQ-irq-unsafe lock:
[  107.985531] -> (&fq->lock){+.-.}-{3:3} {
...
[  107.988053]  ... acquired at:
[  107.988054]    check_prev_add+0xfb/0xca0
[  107.988058]    validate_chain+0x48c/0x530
[  107.988061]    __lock_acquire+0x550/0xbc0
[  107.988064]    lock_acquire.part.0+0xa1/0x210
[  107.988068]    _raw_spin_lock_bh+0x38/0x50
[  107.988070]    ieee80211_queue_skb+0xfd/0x350 [mac80211]
[  107.988198]    __ieee80211_xmit_fast+0x202/0x360 [mac80211]
[  107.988314]    ieee80211_xmit_fast+0xfb/0x1f0 [mac80211]
[  107.988424]    __ieee80211_subif_start_xmit+0x14e/0x3d0 [mac80211]
[  107.988530]    ieee80211_subif_start_xmit+0x46/0x230 [mac80211]
[  107.988634]    netpoll_start_xmit+0x8b/0xd0
[  107.988638]    __netpoll_send_skb+0x329/0x3b0
[  107.988641]    write_msg+0x104/0x120 [netconsole]
[  107.988647]    console_emit_next_record+0x203/0x250
[  107.988652]    console_flush_all+0x24d/0x370
[  107.988657]    console_unlock+0x66/0x130
[  107.988662]    vprintk_emit+0x142/0x360
[  107.988666]    _printk+0x5b/0x80
[  107.988671]    enabled_store.cold+0x7e/0x83 [netconsole]
[  107.988677]    configfs_write_iter+0xbd/0x120 [configfs]
[  107.988683]    vfs_write+0x213/0x520
[  107.988689]    ksys_write+0x69/0xe0
[  107.988691]    do_syscall_64+0x94/0xa10
[  107.988695]    entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> We started hitting this a lot in the CI as well, lockdep must have
> gotten more sensitive in 6.17. Last I checked lockdep didn't understand

FWIW, I remember there were similar reports last year but with
xmit lock.

> that we manually test for nesting with netif_local_xmit_active().

Looks like Breno tried to simplify it, the original syz report
gave the following scenario:

[  107.984942] Chain exists of:
                  console_owner --> target_list_lock --> &fq->lock

[  107.984947]  Possible interrupt unsafe locking scenario:
[  107.984948]        CPU0                    CPU1
[  107.984949]        ----                    ----
[  107.984950]   lock(&fq->lock);
[  107.984952]                                local_irq_disable();
[  107.984952]                                lock(console_owner);
[  107.984954]                                lock(target_list_lock);
[  107.984956]   <Interrupt>
[  107.984957]     lock(console_owner);


Seems like with the fq->lock trace I pasted above we can get sth like:

         CPU0                    CPU1
         ----                    ----
    lock(&fq->lock);
                                 local_irq_disable();
                                 lock(console_owner);
                                 lock(target_list_lock);
                                 lock(&fq->lock);
    <Interrupt>
      lock(console_owner);

Nesting checks won't help with this one.

-- 
Pavel Begunkov