[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aJI8iPFU9__PW-tU@mini-arch>
Date: Tue, 5 Aug 2025 10:16:56 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: syzbot <syzbot+e6300f66a999a6612477@...kaller.appspotmail.com>
Cc: andrii@...nel.org, ast@...nel.org, bjorn@...nel.org,
bpf@...r.kernel.org, daniel@...earbox.net, davem@...emloft.net,
edumazet@...gle.com, horms@...nel.org, jonathan.lemon@...il.com,
kuba@...nel.org, linux-kernel@...r.kernel.org,
maciej.fijalkowski@...el.com, magnus.karlsson@...el.com,
netdev@...r.kernel.org, pabeni@...hat.com, sdf@...ichev.me,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [bpf?] [net?] possible deadlock in xsk_diag_dump (2)
On 08/04, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: d2eedaa3909b Merge tag 'rtc-6.17' of git://git.kernel.org/..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=159482f0580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=75e522434dc68cb9
> dashboard link: https://syzkaller.appspot.com/bug?extid=e6300f66a999a6612477
> compiler: Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/3435b26b899d/disk-d2eedaa3.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/531223373575/vmlinux-d2eedaa3.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/e82f9030b8d5/bzImage-d2eedaa3.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+e6300f66a999a6612477@...kaller.appspotmail.com
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.16.0-syzkaller-11489-gd2eedaa3909b #0 Not tainted
> ------------------------------------------------------
> syz.8.4735/22857 is trying to acquire lock:
> ffff8880223e06b8 (&xs->mutex){+.+.}-{4:4}, at: xsk_diag_fill net/xdp/xsk_diag.c:113 [inline]
> ffff8880223e06b8 (&xs->mutex){+.+.}-{4:4}, at: xsk_diag_dump+0x550/0x14d0 net/xdp/xsk_diag.c:166
>
> but task is already holding lock:
> ffff888031291c98 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_diag_dump+0x178/0x14d0 net/xdp/xsk_diag.c:158
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&net->xdp.lock){+.+.}-{4:4}:
> lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
> __mutex_lock_common kernel/locking/mutex.c:598 [inline]
> __mutex_lock+0x187/0x1360 kernel/locking/mutex.c:760
> xsk_notifier+0x89/0x230 net/xdp/xsk.c:1664
> notifier_call_chain+0x1b6/0x3e0 kernel/notifier.c:85
> call_netdevice_notifiers_extack net/core/dev.c:2267 [inline]
> call_netdevice_notifiers net/core/dev.c:2281 [inline]
> unregister_netdevice_many_notify+0x14d7/0x1ff0 net/core/dev.c:12156
> unregister_netdevice_many net/core/dev.c:12219 [inline]
> unregister_netdevice_queue+0x33c/0x380 net/core/dev.c:12063
> register_netdevice+0x1689/0x1ae0 net/core/dev.c:11241
> bpq_new_device drivers/net/hamradio/bpqether.c:481 [inline]
> bpq_device_event+0x491/0x600 drivers/net/hamradio/bpqether.c:523
> notifier_call_chain+0x1b6/0x3e0 kernel/notifier.c:85
> call_netdevice_notifiers_extack net/core/dev.c:2267 [inline]
> call_netdevice_notifiers net/core/dev.c:2281 [inline]
> __dev_notify_flags+0x18d/0x2e0 net/core/dev.c:-1
> netif_change_flags+0xe8/0x1a0 net/core/dev.c:9608
> dev_change_flags+0x130/0x260 net/core/dev_api.c:68
> devinet_ioctl+0xbb4/0x1b50 net/ipv4/devinet.c:1200
> inet_ioctl+0x3c0/0x4c0 net/ipv4/af_inet.c:1001
> sock_do_ioctl+0xdc/0x300 net/socket.c:1238
> sock_ioctl+0x576/0x790 net/socket.c:1359
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:598 [inline]
> __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:584
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> -> #1 (&dev_instance_lock_key#20){+.+.}-{4:4}:
> lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
> __mutex_lock_common kernel/locking/mutex.c:598 [inline]
> __mutex_lock+0x187/0x1360 kernel/locking/mutex.c:760
> netdev_lock include/linux/netdevice.h:2758 [inline]
> netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> xsk_bind+0x2f7/0xf90 net/xdp/xsk.c:1193
> __sys_bind_socket net/socket.c:1858 [inline]
> __sys_bind+0x2c6/0x3e0 net/socket.c:1889
> __do_sys_bind net/socket.c:1894 [inline]
> __se_sys_bind net/socket.c:1892 [inline]
> __x64_sys_bind+0x7a/0x90 net/socket.c:1892
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> -> #0 (&xs->mutex){+.+.}-{4:4}:
> check_prev_add kernel/locking/lockdep.c:3165 [inline]
> check_prevs_add kernel/locking/lockdep.c:3284 [inline]
> validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
> __lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
> lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
> __mutex_lock_common kernel/locking/mutex.c:598 [inline]
> __mutex_lock+0x187/0x1360 kernel/locking/mutex.c:760
> xsk_diag_fill net/xdp/xsk_diag.c:113 [inline]
> xsk_diag_dump+0x550/0x14d0 net/xdp/xsk_diag.c:166
> netlink_dump+0x6e4/0xe90 net/netlink/af_netlink.c:2327
> __netlink_dump_start+0x5cb/0x7e0 net/netlink/af_netlink.c:2442
> netlink_dump_start include/linux/netlink.h:341 [inline]
> xsk_diag_handler_dump+0x183/0x220 net/xdp/xsk_diag.c:193
> sock_diag_rcv_msg+0x4cc/0x600 net/core/sock_diag.c:-1
> netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2552
> netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
> netlink_unicast+0x82f/0x9e0 net/netlink/af_netlink.c:1346
> netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1896
> sock_sendmsg_nosec net/socket.c:714 [inline]
> __sock_sendmsg+0x21c/0x270 net/socket.c:729
> sock_write_iter+0x258/0x330 net/socket.c:1179
> do_iter_readv_writev+0x56e/0x7f0 fs/read_write.c:-1
> vfs_writev+0x31a/0x960 fs/read_write.c:1057
> do_writev+0x14d/0x2d0 fs/read_write.c:1103
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> other info that might help us debug this:
>
> Chain exists of:
> &xs->mutex --> &dev_instance_lock_key#20 --> &net->xdp.lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&net->xdp.lock);
> lock(&dev_instance_lock_key#20);
> lock(&net->xdp.lock);
> lock(&xs->mutex);
>
> *** DEADLOCK ***
>
> 2 locks held by syz.8.4735/22857:
> #0: ffff8880223e16d0 (nlk_cb_mutex-SOCK_DIAG){+.+.}-{4:4}, at: __netlink_dump_start+0xfe/0x7e0 net/netlink/af_netlink.c:2406
> #1: ffff888031291c98 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_diag_dump+0x178/0x14d0 net/xdp/xsk_diag.c:158
>
> stack backtrace:
> CPU: 0 UID: 0 PID: 22857 Comm: syz.8.4735 Not tainted 6.16.0-syzkaller-11489-gd2eedaa3909b #0 PREEMPT(full)
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
> Call Trace:
> <TASK>
> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
> print_circular_bug+0x2ee/0x310 kernel/locking/lockdep.c:2043
> check_noncircular+0x134/0x160 kernel/locking/lockdep.c:2175
> check_prev_add kernel/locking/lockdep.c:3165 [inline]
> check_prevs_add kernel/locking/lockdep.c:3284 [inline]
> validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
> __lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
> lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
> __mutex_lock_common kernel/locking/mutex.c:598 [inline]
> __mutex_lock+0x187/0x1360 kernel/locking/mutex.c:760
> xsk_diag_fill net/xdp/xsk_diag.c:113 [inline]
> xsk_diag_dump+0x550/0x14d0 net/xdp/xsk_diag.c:166
> netlink_dump+0x6e4/0xe90 net/netlink/af_netlink.c:2327
> __netlink_dump_start+0x5cb/0x7e0 net/netlink/af_netlink.c:2442
> netlink_dump_start include/linux/netlink.h:341 [inline]
> xsk_diag_handler_dump+0x183/0x220 net/xdp/xsk_diag.c:193
> sock_diag_rcv_msg+0x4cc/0x600 net/core/sock_diag.c:-1
> netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2552
> netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
> netlink_unicast+0x82f/0x9e0 net/netlink/af_netlink.c:1346
> netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1896
> sock_sendmsg_nosec net/socket.c:714 [inline]
> __sock_sendmsg+0x21c/0x270 net/socket.c:729
> sock_write_iter+0x258/0x330 net/socket.c:1179
> do_iter_readv_writev+0x56e/0x7f0 fs/read_write.c:-1
> vfs_writev+0x31a/0x960 fs/read_write.c:1057
> do_writev+0x14d/0x2d0 fs/read_write.c:1103
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f6e7b38eb69
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f6e791d5038 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
> RAX: ffffffffffffffda RBX: 00007f6e7b5b6160 RCX: 00007f6e7b38eb69
> RDX: 0000000000000001 RSI: 0000200000000140 RDI: 0000000000000007
> RBP: 00007f6e7b411df1 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007f6e7b5b6160 R15: 00007ffdbe0ab9a8
> </TASK>
Looks similar to [0] but this time comes via bpq_device_event.
I can try to pack the fix from 0 and do something similar here as well..
0: https://lore.kernel.org/netdev/685af3b1.a00a0220.2e5631.0091.GAE@google.com/
Powered by blists - more mailing lists