[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-1IZc7G1hrsnzjP@mini-arch>
Date: Wed, 2 Apr 2025 07:23:33 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: syzbot <syzbot+9f46f55b69eb4f3e054b@...kaller.appspotmail.com>
Cc: davem@...emloft.net, edumazet@...gle.com, horms@...nel.org,
kuba@...nel.org, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, pabeni@...hat.com,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] possible deadlock in dev_close
On 04/01, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 0c86b42439b6 Merge tag 'drm-next-2025-03-28' of https://gi..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1353c678580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=500ed53123ea6589
> dashboard link: https://syzkaller.appspot.com/bug?extid=9f46f55b69eb4f3e054b
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-0c86b424.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/3e78f55971a9/vmlinux-0c86b424.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/3f8acc0407dd/bzImage-0c86b424.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+9f46f55b69eb4f3e054b@...kaller.appspotmail.com
>
> loop0: detected capacity change from 0 to 1024
> netlink: 36 bytes leftover after parsing attributes in process `syz.0.0'.
> netlink: 'syz.0.0': attribute type 10 has an invalid length.
> bond0: (slave netdevsim0): Enslaving as an active interface with an up link
> bond0: (slave netdevsim0): Releasing backup interface
> ============================================
> WARNING: possible recursive locking detected
> 6.14.0-syzkaller-09352-g0c86b42439b6 #0 Not tainted
> --------------------------------------------
> syz.0.0/5321 is trying to acquire lock:
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2751 [inline]
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: dev_close+0x121/0x280 net/core/dev_api.c:224
>
> but task is already holding lock:
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2751 [inline]
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: do_setlink+0x209/0x4370 net/core/rtnetlink.c:3025
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&dev->lock);
> lock(&dev->lock);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 2 locks held by syz.0.0/5321:
> #0: ffffffff900e5f48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
> #0: ffffffff900e5f48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
> #0: ffffffff900e5f48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0xd68/0x1fe0 net/core/rtnetlink.c:4061
> #1: ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2751 [inline]
> #1: ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> #1: ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: do_setlink+0x209/0x4370 net/core/rtnetlink.c:3025
>
> stack backtrace:
> CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.14.0-syzkaller-09352-g0c86b42439b6 #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:94 [inline]
> dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
> print_deadlock_bug+0x2be/0x2d0 kernel/locking/lockdep.c:3042
> check_deadlock kernel/locking/lockdep.c:3094 [inline]
> validate_chain+0x928/0x24e0 kernel/locking/lockdep.c:3896
> __lock_acquire+0xad5/0xd80 kernel/locking/lockdep.c:5235
> lock_acquire+0x116/0x2f0 kernel/locking/lockdep.c:5866
> __mutex_lock_common kernel/locking/mutex.c:587 [inline]
> __mutex_lock+0x1a5/0x10c0 kernel/locking/mutex.c:732
> netdev_lock include/linux/netdevice.h:2751 [inline]
> netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> dev_close+0x121/0x280 net/core/dev_api.c:224
> __bond_release_one+0xcaf/0x1220 drivers/net/bonding/bond_main.c:2629
> bond_slave_netdev_event drivers/net/bonding/bond_main.c:4028 [inline]
> bond_netdev_event+0x557/0xfb0 drivers/net/bonding/bond_main.c:4146
> notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85
> call_netdevice_notifiers_extack net/core/dev.c:2218 [inline]
> call_netdevice_notifiers net/core/dev.c:2232 [inline]
> netif_change_net_namespace+0xa30/0x1c20 net/core/dev.c:12163
> do_setlink+0x3aa/0x4370 net/core/rtnetlink.c:3042
Looks like it is UNREGISTER notifier for bond. I think this is gonna be
(accidentally) fixed by https://lore.kernel.org/netdev/20250401163452.622454-3-sdf@fomichev.me/T/#u
which stops grabbing instance lock during UNREGISTER.
Powered by blists - more mailing lists