netdev - Re: [syzbot] [net?] possible deadlock in rtnl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aDiEby8WRjJ9Gyfx@mini-arch>
Date: Thu, 29 May 2025 08:59:43 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: syzbot <syzbot+846bb38dc67fe62cc733@...kaller.appspotmail.com>
Cc: davem@...emloft.net, edumazet@...gle.com, horms@...nel.org,
	kuba@...nel.org, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, pabeni@...hat.com,
	syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] possible deadlock in rtnl_newlink

On 05/29, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    b1427432d3b6 Merge tag 'iommu-fixes-v6.15-rc7' of git://gi..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=161ef5f4580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=9fd1c9848687d742
> dashboard link: https://syzkaller.appspot.com/bug?extid=846bb38dc67fe62cc733
> compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12d21170580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17d9a8e8580000
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-b1427432.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/47b0c66c70d9/vmlinux-b1427432.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/a2df6bfabd3c/bzImage-b1427432.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+846bb38dc67fe62cc733@...kaller.appspotmail.com
> 
> ifb0: entered allmulticast mode
> ifb1: entered allmulticast mode
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.15.0-rc7-syzkaller-00144-gb1427432d3b6 #0 Not tainted
> ------------------------------------------------------
> syz-executor216/5313 is trying to acquire lock:
> ffff888033f496f0 ((work_completion)(&adapter->reset_task)){+.+.}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
> ffff888033f496f0 ((work_completion)(&adapter->reset_task)){+.+.}-{0:0}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline]
> ffff888033f496f0 ((work_completion)(&adapter->reset_task)){+.+.}-{0:0}, at: start_flush_work kernel/workqueue.c:4150 [inline]
> ffff888033f496f0 ((work_completion)(&adapter->reset_task)){+.+.}-{0:0}, at: __flush_work+0xd2/0xbc0 kernel/workqueue.c:4208
> 
> but task is already holding lock:
> ffffffff8f2fab48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
> ffffffff8f2fab48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
> ffffffff8f2fab48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x8db/0x1c70 net/core/rtnetlink.c:4064
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (rtnl_mutex){+.+.}-{4:4}:
>        lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5866
>        __mutex_lock_common kernel/locking/mutex.c:601 [inline]
>        __mutex_lock+0x182/0xe80 kernel/locking/mutex.c:746
>        e1000_reset_task+0x56/0xc0 drivers/net/ethernet/intel/e1000/e1000_main.c:3512
>        process_one_work kernel/workqueue.c:3238 [inline]
>        process_scheduled_works+0xadb/0x17a0 kernel/workqueue.c:3319
>        worker_thread+0x8a0/0xda0 kernel/workqueue.c:3400
>        kthread+0x70e/0x8a0 kernel/kthread.c:464
>        ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
>        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> 
> -> #0 ((work_completion)(&adapter->reset_task)){+.+.}-{0:0}:
>        check_prev_add kernel/locking/lockdep.c:3166 [inline]
>        check_prevs_add kernel/locking/lockdep.c:3285 [inline]
>        validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3909
>        __lock_acquire+0xaac/0xd20 kernel/locking/lockdep.c:5235
>        lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5866
>        touch_work_lockdep_map kernel/workqueue.c:3922 [inline]
>        start_flush_work kernel/workqueue.c:4176 [inline]
>        __flush_work+0x6b8/0xbc0 kernel/workqueue.c:4208
>        __cancel_work_sync+0xbe/0x110 kernel/workqueue.c:4364
>        e1000_down+0x402/0x6b0 drivers/net/ethernet/intel/e1000/e1000_main.c:526
>        e1000_close+0x17b/0xa10 drivers/net/ethernet/intel/e1000/e1000_main.c:1448
>        __dev_close_many+0x361/0x6f0 net/core/dev.c:1702
>        __dev_close net/core/dev.c:1714 [inline]
>        __dev_change_flags+0x2c7/0x6d0 net/core/dev.c:9352
>        netif_change_flags+0x88/0x1a0 net/core/dev.c:9417
>        do_setlink+0xcb9/0x40d0 net/core/rtnetlink.c:3152
>        rtnl_group_changelink net/core/rtnetlink.c:3783 [inline]
>        __rtnl_newlink net/core/rtnetlink.c:3937 [inline]
>        rtnl_newlink+0x149f/0x1c70 net/core/rtnetlink.c:4065
>        rtnetlink_rcv_msg+0x7cc/0xb70 net/core/rtnetlink.c:6955
>        netlink_rcv_skb+0x219/0x490 net/netlink/af_netlink.c:2534
>        netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
>        netlink_unicast+0x75b/0x8d0 net/netlink/af_netlink.c:1339
>        netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1883
>        sock_sendmsg_nosec net/socket.c:712 [inline]
>        __sock_sendmsg+0x21c/0x270 net/socket.c:727
>        ____sys_sendmsg+0x505/0x830 net/socket.c:2566
>        ___sys_sendmsg+0x21f/0x2a0 net/socket.c:2620
>        __sys_sendmsg net/socket.c:2652 [inline]
>        __do_sys_sendmsg net/socket.c:2657 [inline]
>        __se_sys_sendmsg net/socket.c:2655 [inline]
>        __x64_sys_sendmsg+0x19b/0x260 net/socket.c:2655
>        do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>        do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94
>        entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(rtnl_mutex);
>                                lock((work_completion)(&adapter->reset_task));
>                                lock(rtnl_mutex);
>   lock((work_completion)(&adapter->reset_task));

So this is internal WQ entry lock that is being reordered with rtnl
lock. But looking at process_one_work, I don't see actual locks, mostly
lock_map_acquire/lock_map_release calls to enforce some internal WQ
invariants. Not sure what to do with it, will try to read more.