[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFxgg4rCQ8tfM9dw@mini-arch>
Date: Wed, 25 Jun 2025 13:48:03 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: syzbot <syzbot+e67ea9c235b13b4f0020@...kaller.appspotmail.com>,
andrii@...nel.org, ast@...nel.org, bjorn@...nel.org,
bpf@...r.kernel.org, daniel@...earbox.net, davem@...emloft.net,
edumazet@...gle.com, horms@...nel.org, jonathan.lemon@...il.com,
kuba@...nel.org, linux-kernel@...r.kernel.org,
maciej.fijalkowski@...el.com, magnus.karlsson@...el.com,
netdev@...r.kernel.org, pabeni@...hat.com, sdf@...ichev.me,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [bpf?] [net?] possible deadlock in xsk_notifier (3)
On 06/25, Stanislav Fomichev wrote:
> On 06/25, Jason Xing wrote:
> > On Wed, Jun 25, 2025 at 11:06 PM Stanislav Fomichev
> > <stfomichev@...il.com> wrote:
> > >
> > > On 06/25, Jason Xing wrote:
> > > > On Wed, Jun 25, 2025 at 2:51 AM syzbot
> > > > <syzbot+e67ea9c235b13b4f0020@...kaller.appspotmail.com> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following issue on:
> > > > >
> > > > > HEAD commit: 78f4e737a53e Merge tag 'for-6.16/dm-fixes' of git://git.ke..
> > > > > git tree: upstream
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=11b48f0c580000
> > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=12ec1a20ad573841
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=e67ea9c235b13b4f0020
> > > > > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > >
> > > > > Downloadable assets:
> > > > > disk image: https://storage.googleapis.com/syzbot-assets/3ff97b2d201b/disk-78f4e737.raw.xz
> > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/1968f46c8915/vmlinux-78f4e737.xz
> > > > > kernel image: https://storage.googleapis.com/syzbot-assets/3455e371b965/bzImage-78f4e737.xz
> > > > >
> > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > Reported-by: syzbot+e67ea9c235b13b4f0020@...kaller.appspotmail.com
> > > > >
> > > > > netlink: 4 bytes leftover after parsing attributes in process `syz.1.1331'.
> > > > > ======================================================
> > > > > WARNING: possible circular locking dependency detected
> > > > > 6.16.0-rc3-syzkaller-00042-g78f4e737a53e #0 Not tainted
> > > > > ------------------------------------------------------
> > > > > syz.1.1331/11144 is trying to acquire lock:
> > > > > ffff888054b136b0 (&xs->mutex){+.+.}-{4:4}, at: xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
> > > > >
> > > > > but task is already holding lock:
> > > > > ffff888052f43d58 (&net->xdp.lock){+.+.}-{4:4}, at: xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
> > > > >
> > > > > which lock already depends on the new lock.
> > > > >
> > > > >
> > > > > the existing dependency chain (in reverse order) is:
> > > > >
> > > > > -> #2 (&net->xdp.lock){+.+.}-{4:4}:
> > > > > __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > > > > __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > > > > xsk_notifier+0xa4/0x280 net/xdp/xsk.c:1645
> > > > > notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > > > > call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > > > > call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > > > > call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > > > > unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
> > > > > unregister_netdevice_many net/core/dev.c:12140 [inline]
> > > > > unregister_netdevice_queue+0x305/0x3f0 net/core/dev.c:11984
> > > > > register_netdevice+0x18f1/0x2270 net/core/dev.c:11149
> > > > > lapbeth_new_device drivers/net/wan/lapbether.c:420 [inline]
> > > > > lapbeth_device_event+0x5b1/0xbe0 drivers/net/wan/lapbether.c:462
> > > > > notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > > > > call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > > > > call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > > > > call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > > > > __dev_notify_flags+0x12c/0x2e0 net/core/dev.c:9497
> > > > > netif_change_flags+0x108/0x160 net/core/dev.c:9526
> > > > > dev_change_flags+0xba/0x250 net/core/dev_api.c:68
> > > > > devinet_ioctl+0x11d5/0x1f50 net/ipv4/devinet.c:1200
> > > > > inet_ioctl+0x3a7/0x3f0 net/ipv4/af_inet.c:1001
> > > > > sock_do_ioctl+0x118/0x280 net/socket.c:1190
> > > > > sock_ioctl+0x227/0x6b0 net/socket.c:1311
> > > > > vfs_ioctl fs/ioctl.c:51 [inline]
> > > > > __do_sys_ioctl fs/ioctl.c:907 [inline]
> > > > > __se_sys_ioctl fs/ioctl.c:893 [inline]
> > > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
> > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > > > do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > >
> > > > > -> #1 (&dev_instance_lock_key#20){+.+.}-{4:4}:
> > > > > __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > > > > __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > > > > netdev_lock include/linux/netdevice.h:2756 [inline]
> > > > > netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> > > > > xsk_bind+0x37c/0x1570 net/xdp/xsk.c:1189
> > > > > __sys_bind_socket net/socket.c:1810 [inline]
> > > > > __sys_bind_socket net/socket.c:1802 [inline]
> > > > > __sys_bind+0x1a7/0x260 net/socket.c:1841
> > > > > __do_sys_bind net/socket.c:1846 [inline]
> > > > > __se_sys_bind net/socket.c:1844 [inline]
> > > > > __x64_sys_bind+0x72/0xb0 net/socket.c:1844
> > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > > > do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > >
> > > > > -> #0 (&xs->mutex){+.+.}-{4:4}:
> > > > > check_prev_add kernel/locking/lockdep.c:3168 [inline]
> > > > > check_prevs_add kernel/locking/lockdep.c:3287 [inline]
> > > > > validate_chain kernel/locking/lockdep.c:3911 [inline]
> > > > > __lock_acquire+0x126f/0x1c90 kernel/locking/lockdep.c:5240
> > > > > lock_acquire kernel/locking/lockdep.c:5871 [inline]
> > > > > lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5828
> > > > > __mutex_lock_common kernel/locking/mutex.c:602 [inline]
> > > > > __mutex_lock+0x199/0xb90 kernel/locking/mutex.c:747
> > > > > xsk_notifier+0x101/0x280 net/xdp/xsk.c:1649
> > > > > notifier_call_chain+0xbc/0x410 kernel/notifier.c:85
> > > > > call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:2230
> > > > > call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
> > > > > call_netdevice_notifiers net/core/dev.c:2282 [inline]
> > > > > unregister_netdevice_many_notify+0xf9d/0x2700 net/core/dev.c:12077
> > > > > rtnl_delete_link net/core/rtnetlink.c:3511 [inline]
> > > > > rtnl_dellink+0x3cb/0xa80 net/core/rtnetlink.c:3553
> > > > > rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6944
> > > > > netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2534
> > > > > netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
> > > > > netlink_unicast+0x53d/0x7f0 net/netlink/af_netlink.c:1339
> > > > > netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883
> > > > > sock_sendmsg_nosec net/socket.c:712 [inline]
> > > > > __sock_sendmsg net/socket.c:727 [inline]
> > > > > ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566
> > > > > ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620
> > > > > __sys_sendmsg+0x16d/0x220 net/socket.c:2652
> > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > > > > do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > >
> > > > > other info that might help us debug this:
> > > > >
> > > > > Chain exists of:
> > > > > &xs->mutex --> &dev_instance_lock_key#20 --> &net->xdp.lock
> > > > >
> > > > > Possible unsafe locking scenario:
> > > > >
> > > > > CPU0 CPU1
> > > > > ---- ----
> > > > > lock(&net->xdp.lock);
> > > > > lock(&dev_instance_lock_key#20);
> > > > > lock(&net->xdp.lock);
> > > > > lock(&xs->mutex);
> > > >
> > > > I feel the above race map is not that right?
> > > >
> > > > My understanding is as shown below.
> > > > CPU 0 CPU 1
> > > > --- ---
> > > > unregister_netdevice_many_notify()
> > > > xsk_bind()
> > > > netdev_lock_ops(dev);
> > > >
> > > > mutex_lock(&xs->mutex);
> > > > netdev_lock_ops(dev);
> > > > xsk_notifier()
> > > > mutex_lock(&net->xdp.lock);
> > > > mutex_lock(&xs->mutex);
> > > >
> > > > So ABBA lock case happens, IIUC.
> > >
> > > Since we can't (easily) control the ordering in notifiers, looks like
> > > we need to align xsk_bind ordering (to be instance lock -> xs->mutex).
> > > LMK if you want to take a stab at this; otherwise I'll try to send a
> > > fix.
> >
> > I'm still learning the af_xdp. Sure, I'm interested in it, just a bit
> > worried if I'm capable of completing it. I will try then.
>
> SG, thanks! If you need more details lmk, but basically we need to reorder
> netdev_lock_ops() and mutex_lock(lock: &xs->mutex)+XSK_READY check.
> And similarly for cleanup (out_unlock/out_release) path.
Jakub just told me that I'm wrong and it looks similar to commit
f0433eea4688 ("net: don't mix device locking in dev_close_many()
calls"). So this is not as easy as flipping the lock ordering :-(
Powered by blists - more mailing lists