[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMArcTUfKfsB1aZAyD+vVffRsG5ZJYcKgT=jKtJ3ptKqYE7WFw@mail.gmail.com>
Date: Thu, 3 Apr 2025 00:10:53 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: syzbot <syzbot+9f46f55b69eb4f3e054b@...kaller.appspotmail.com>,
davem@...emloft.net, edumazet@...gle.com, horms@...nel.org, kuba@...nel.org,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org, pabeni@...hat.com,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] possible deadlock in dev_close
On Wed, Apr 2, 2025 at 11:27 PM Stanislav Fomichev <stfomichev@...il.com> wrote:
>
Hi Stanislav,
> On 04/01, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 0c86b42439b6 Merge tag 'drm-next-2025-03-28' of https://gi..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1353c678580000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=500ed53123ea6589
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9f46f55b69eb4f3e054b
> > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-0c86b424.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/3e78f55971a9/vmlinux-0c86b424.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/3f8acc0407dd/bzImage-0c86b424.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+9f46f55b69eb4f3e054b@...kaller.appspotmail.com
> >
> > loop0: detected capacity change from 0 to 1024
> > netlink: 36 bytes leftover after parsing attributes in process `syz.0.0'.
> > netlink: 'syz.0.0': attribute type 10 has an invalid length.
> > bond0: (slave netdevsim0): Enslaving as an active interface with an up link
> > bond0: (slave netdevsim0): Releasing backup interface
> > ============================================
> > WARNING: possible recursive locking detected
> > 6.14.0-syzkaller-09352-g0c86b42439b6 #0 Not tainted
> > --------------------------------------------
> > syz.0.0/5321 is trying to acquire lock:
> > ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2751 [inline]
> > ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> > ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: dev_close+0x121/0x280 net/core/dev_api.c:224
> >
> > but task is already holding lock:
> > ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2751 [inline]
> > ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> > ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: do_setlink+0x209/0x4370 net/core/rtnetlink.c:3025
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > ----
> > lock(&dev->lock);
> > lock(&dev->lock);
> >
> > *** DEADLOCK ***
> >
> > May be due to missing lock nesting notation
> >
> > 2 locks held by syz.0.0/5321:
> > #0: ffffffff900e5f48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
> > #0: ffffffff900e5f48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
> > #0: ffffffff900e5f48 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0xd68/0x1fe0 net/core/rtnetlink.c:4061
> > #1: ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2751 [inline]
> > #1: ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> > #1: ffff888042eccd28 (&dev->lock){+.+.}-{4:4}, at: do_setlink+0x209/0x4370 net/core/rtnetlink.c:3025
> >
> > stack backtrace:
> > CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.14.0-syzkaller-09352-g0c86b42439b6 #0 PREEMPT(full)
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:94 [inline]
> > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
> > print_deadlock_bug+0x2be/0x2d0 kernel/locking/lockdep.c:3042
> > check_deadlock kernel/locking/lockdep.c:3094 [inline]
> > validate_chain+0x928/0x24e0 kernel/locking/lockdep.c:3896
> > __lock_acquire+0xad5/0xd80 kernel/locking/lockdep.c:5235
> > lock_acquire+0x116/0x2f0 kernel/locking/lockdep.c:5866
> > __mutex_lock_common kernel/locking/mutex.c:587 [inline]
> > __mutex_lock+0x1a5/0x10c0 kernel/locking/mutex.c:732
> > netdev_lock include/linux/netdevice.h:2751 [inline]
> > netdev_lock_ops include/net/netdev_lock.h:42 [inline]
> > dev_close+0x121/0x280 net/core/dev_api.c:224
> > __bond_release_one+0xcaf/0x1220 drivers/net/bonding/bond_main.c:2629
> > bond_slave_netdev_event drivers/net/bonding/bond_main.c:4028 [inline]
> > bond_netdev_event+0x557/0xfb0 drivers/net/bonding/bond_main.c:4146
> > notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85
> > call_netdevice_notifiers_extack net/core/dev.c:2218 [inline]
> > call_netdevice_notifiers net/core/dev.c:2232 [inline]
> > netif_change_net_namespace+0xa30/0x1c20 net/core/dev.c:12163
> > do_setlink+0x3aa/0x4370 net/core/rtnetlink.c:3042
>
> Looks like it is UNREGISTER notifier for bond. I think this is gonna be
> (accidentally) fixed by https://lore.kernel.org/netdev/20250401163452.622454-3-sdf@fomichev.me/T/#u
> which stops grabbing instance lock during UNREGISTER.
>
I found a reproducer.
interface=<physical interface>
ip netns add ns_test
ip link add bond0 type bond
ip link set $interface master bond0 netns ns_test
So, deadlock occurs, and the splat is the same as in this report.
As you mentioned, I applied[1] and tested again.
I can't see any deadlock or warning.
[1] https://lore.kernel.org/netdev/20250401163452.622454-3-sdf@fomichev.me/T/#u
Thanks a lot!
Taehee Yoo
Powered by blists - more mailing lists