[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aDiEby8WRjJ9Gyfx@mini-arch>
Date: Thu, 29 May 2025 08:59:43 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: syzbot <syzbot+846bb38dc67fe62cc733@...kaller.appspotmail.com>
Cc: davem@...emloft.net, edumazet@...gle.com, horms@...nel.org,
kuba@...nel.org, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, pabeni@...hat.com,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] possible deadlock in rtnl_newlink
On 05/29, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: b1427432d3b6 Merge tag 'iommu-fixes-v6.15-rc7' of git://gi..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=161ef5f4580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=9fd1c9848687d742
> dashboard link: https://syzkaller.appspot.com/bug?extid=846bb38dc67fe62cc733
> compiler: Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12d21170580000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17d9a8e8580000
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-b1427432.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/47b0c66c70d9/vmlinux-b1427432.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/a2df6bfabd3c/bzImage-b1427432.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+846bb38dc67fe62cc733@...kaller.appspotmail.com
>
> ifb0: entered allmulticast mode
> ifb1: entered allmulticast mode
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.15.0-rc7-syzkaller-00144-gb1427432d3b6 #0 Not tainted
> ------------------------------------------------------
> syz-executor216/5313 is trying to acquire lock:
> ffff888033f496f0 ((work_completion)(&adapter->reset_task)){+.+.}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
> ffff888033f496f0 ((work_completion)(&adapter->reset_task)){+.+.}-{0:0}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline]
> ffff888033f496f0 ((work_completion)(&adapter->reset_task)){+.+.}-{0:0}, at: start_flush_work kernel/workqueue.c:4150 [inline]
> ffff888033f496f0 ((work_completion)(&adapter->reset_task)){+.+.}-{0:0}, at: __flush_work+0xd2/0xbc0 kernel/workqueue.c:4208
>
> but task is already holding lock:
> ffffffff8f2fab48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
> ffffffff8f2fab48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
> ffffffff8f2fab48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x8db/0x1c70 net/core/rtnetlink.c:4064
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (rtnl_mutex){+.+.}-{4:4}:
> lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5866
> __mutex_lock_common kernel/locking/mutex.c:601 [inline]
> __mutex_lock+0x182/0xe80 kernel/locking/mutex.c:746
> e1000_reset_task+0x56/0xc0 drivers/net/ethernet/intel/e1000/e1000_main.c:3512
> process_one_work kernel/workqueue.c:3238 [inline]
> process_scheduled_works+0xadb/0x17a0 kernel/workqueue.c:3319
> worker_thread+0x8a0/0xda0 kernel/workqueue.c:3400
> kthread+0x70e/0x8a0 kernel/kthread.c:464
> ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>
> -> #0 ((work_completion)(&adapter->reset_task)){+.+.}-{0:0}:
> check_prev_add kernel/locking/lockdep.c:3166 [inline]
> check_prevs_add kernel/locking/lockdep.c:3285 [inline]
> validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3909
> __lock_acquire+0xaac/0xd20 kernel/locking/lockdep.c:5235
> lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5866
> touch_work_lockdep_map kernel/workqueue.c:3922 [inline]
> start_flush_work kernel/workqueue.c:4176 [inline]
> __flush_work+0x6b8/0xbc0 kernel/workqueue.c:4208
> __cancel_work_sync+0xbe/0x110 kernel/workqueue.c:4364
> e1000_down+0x402/0x6b0 drivers/net/ethernet/intel/e1000/e1000_main.c:526
> e1000_close+0x17b/0xa10 drivers/net/ethernet/intel/e1000/e1000_main.c:1448
> __dev_close_many+0x361/0x6f0 net/core/dev.c:1702
> __dev_close net/core/dev.c:1714 [inline]
> __dev_change_flags+0x2c7/0x6d0 net/core/dev.c:9352
> netif_change_flags+0x88/0x1a0 net/core/dev.c:9417
> do_setlink+0xcb9/0x40d0 net/core/rtnetlink.c:3152
> rtnl_group_changelink net/core/rtnetlink.c:3783 [inline]
> __rtnl_newlink net/core/rtnetlink.c:3937 [inline]
> rtnl_newlink+0x149f/0x1c70 net/core/rtnetlink.c:4065
> rtnetlink_rcv_msg+0x7cc/0xb70 net/core/rtnetlink.c:6955
> netlink_rcv_skb+0x219/0x490 net/netlink/af_netlink.c:2534
> netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
> netlink_unicast+0x75b/0x8d0 net/netlink/af_netlink.c:1339
> netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1883
> sock_sendmsg_nosec net/socket.c:712 [inline]
> __sock_sendmsg+0x21c/0x270 net/socket.c:727
> ____sys_sendmsg+0x505/0x830 net/socket.c:2566
> ___sys_sendmsg+0x21f/0x2a0 net/socket.c:2620
> __sys_sendmsg net/socket.c:2652 [inline]
> __do_sys_sendmsg net/socket.c:2657 [inline]
> __se_sys_sendmsg net/socket.c:2655 [inline]
> __x64_sys_sendmsg+0x19b/0x260 net/socket.c:2655
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(rtnl_mutex);
> lock((work_completion)(&adapter->reset_task));
> lock(rtnl_mutex);
> lock((work_completion)(&adapter->reset_task));
So this is internal WQ entry lock that is being reordered with rtnl
lock. But looking at process_one_work, I don't see actual locks, mostly
lock_map_acquire/lock_map_release calls to enforce some internal WQ
invariants. Not sure what to do with it, will try to read more.
Powered by blists - more mailing lists