linux-kernel - Re: [syzbot] [net?] possible deadlock in dev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-1IZc7G1hrsnzjP@mini-arch>
Date: Wed, 2 Apr 2025 07:23:33 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: syzbot <syzbot+9f46f55b69eb4f3e054b@...kaller.appspotmail.com>
Cc: davem@...emloft.net, edumazet@...gle.com, horms@...nel.org,
	kuba@...nel.org, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, pabeni@...hat.com,
	syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] possible deadlock in dev_close

On 04/01, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    0c86b42439b6 Merge tag 'drm-next-2025-03-28' of https://gi..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1353c678580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=500ed53123ea6589
> dashboard link: https://syzkaller.appspot.com/bug?extid=9f46f55b69eb4f3e054b
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-0c86b424.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/3e78f55971a9/vmlinux-0c86b424.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/3f8acc0407dd/bzImage-0c86b424.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+9f46f55b69eb4f3e054b@...kaller.appspotmail.com
> 
> loop0: detected capacity change from 0 to 1024
> netlink: 36 bytes leftover after parsing attributes in process `syz.0.0'.
> netlink: 'syz.0.0': attribute type 10 has an invalid length.
> bond0: (slave netdevsim0): Enslaving as an active interface with an up link
> bond0: (slave netdevsim0): Releasing backup interface
> ============================================
> WARNING: possible recursive locking detected
> 6.14.0-syzkaller-09352-g0c86b42439b6 #0 Not tainted
> --------------------------------------------
> syz.0.0/5321 is trying to acquire lock:
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2751 [inline]
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: dev_close+0x121/0x280 net/core/dev_api.c:224
> 
> but task is already holding lock:
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2751 [inline]
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: do_setlink+0x209/0x4370 net/core/rtnetlink.c:3025
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&dev->lock);
>   lock(&dev->lock);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 2 locks held by syz.0.0/5321:
>  #0: ffffffff900e5f48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
>  #0: ffffffff900e5f48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
>  #0: ffffffff900e5f48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0xd68/0x1fe0 net/core/rtnetlink.c:4061
>  #1: ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2751 [inline]
>  #1: ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
>  #1: ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: do_setlink+0x209/0x4370 net/core/rtnetlink.c:3025
> 
> stack backtrace:
> CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.14.0-syzkaller-09352-g0c86b42439b6 #0 PREEMPT(full) 
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:94 [inline]
>  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
>  print_deadlock_bug+0x2be/0x2d0 kernel/locking/lockdep.c:3042
>  check_deadlock kernel/locking/lockdep.c:3094 [inline]
>  validate_chain+0x928/0x24e0 kernel/locking/lockdep.c:3896
>  __lock_acquire+0xad5/0xd80 kernel/locking/lockdep.c:5235
>  lock_acquire+0x116/0x2f0 kernel/locking/lockdep.c:5866
>  __mutex_lock_common kernel/locking/mutex.c:587 [inline]
>  __mutex_lock+0x1a5/0x10c0 kernel/locking/mutex.c:732
>  netdev_lock include/linux/netdevice.h:2751 [inline]
>  netdev_lock_ops include/net/netdev_lock.h:42 [inline]
>  dev_close+0x121/0x280 net/core/dev_api.c:224
>  __bond_release_one+0xcaf/0x1220 drivers/net/bonding/bond_main.c:2629
>  bond_slave_netdev_event drivers/net/bonding/bond_main.c:4028 [inline]
>  bond_netdev_event+0x557/0xfb0 drivers/net/bonding/bond_main.c:4146
>  notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85
>  call_netdevice_notifiers_extack net/core/dev.c:2218 [inline]
>  call_netdevice_notifiers net/core/dev.c:2232 [inline]
>  netif_change_net_namespace+0xa30/0x1c20 net/core/dev.c:12163
>  do_setlink+0x3aa/0x4370 net/core/rtnetlink.c:3042

Looks like it is UNREGISTER notifier for bond. I think this is gonna be
(accidentally) fixed by https://lore.kernel.org/netdev/20250401163452.622454-3-sdf@fomichev.me/T/#u
which stops grabbing instance lock during UNREGISTER.