[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLGfEe6HM=2E1wjU89eeS1YwnPcCHpgZqQ=dWuaGV2k+A@mail.gmail.com>
Date: Fri, 27 Sep 2024 13:27:40 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Hillf Danton <hdanton@...a.com>
Cc: syzbot <syzbot+05f9cecd28e356241aba@...kaller.appspotmail.com>,
linux-kernel@...r.kernel.org,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>, Boqun Feng <boqun.feng@...il.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, netdev@...r.kernel.org,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [net?] INFO: task hung in new_device_store (5)
On Fri, Sep 27, 2024 at 1:24 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Fri, Sep 27, 2024 at 1:05 PM Hillf Danton <hdanton@...a.com> wrote:
> >
> > On Thu, 26 Sep 2024 22:14:14 +0200 Eric Dumazet <edumazet@...gle.com>
> > > On Thu, Sep 26, 2024 at 7:58 PM syzbot wrote:
> > > >
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit: 97d8894b6f4c Merge tag 'riscv-for-linus-6.12-mw1' of git:/..
> > > > git tree: upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12416a27980000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=bc30a30374b0753
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=05f9cecd28e356241aba
> > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > >
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/bd119f4fdc08/disk-97d8894b.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/4d0bfed66f93/vmlinux-97d8894b.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/0f9223ac9bfb/bzImage-97d8894b.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+05f9cecd28e356241aba@...kaller.appspotmail.com
> > > >
> > > > INFO: task syz-executor:9916 blocked for more than 143 seconds.
> > > > Not tainted 6.11.0-syzkaller-10045-g97d8894b6f4c #0
> > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > task:syz-executor state:D stack:21104 pid:9916 tgid:9916 ppid:1 flags:0x00000004
> > > > Call Trace:
> > > > <TASK>
> > > > context_switch kernel/sched/core.c:5315 [inline]
> > > > __schedule+0x1895/0x4b30 kernel/sched/core.c:6674
> > > > __schedule_loop kernel/sched/core.c:6751 [inline]
> > > > schedule+0x14b/0x320 kernel/sched/core.c:6766
> > > > schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6823
> > > > __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> > > > __mutex_lock+0x6a7/0xd70 kernel/locking/mutex.c:752
> > > > new_device_store+0x1b4/0x890 :166
> > > > kernfs_fop_write_iter+0x3a2/0x500 fs/kernfs/file.c:334
> > > > new_sync_write fs/read_write.c:590 [inline]
> > > > vfs_write+0xa6f/0xc90 fs/read_write.c:683
> > > > ksys_write+0x183/0x2b0 fs/read_write.c:736
> > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > > do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > RIP: 0033:0x7f8310d7c9df
> > > > RSP: 002b:00007ffe830a52e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> > > > RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f8310d7c9df
> > > > RDX: 0000000000000003 RSI: 00007ffe830a5330 RDI: 0000000000000005
> > > > RBP: 00007f8310df1c39 R08: 0000000000000000 R09: 00007ffe830a5137
> > > > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
> > > > R13: 00007ffe830a5330 R14: 00007f8311a64620 R15: 0000000000000003
> > > > </TASK>
> > >
> > > typical sysfs deadlock ?
> > >
> > > diff --git a/drivers/net/netdevsim/bus.c b/drivers/net/netdevsim/bus.c
> > > index 64c0cdd31bf85468ce4fa2b2af5c8aff4cfba897..3bf0ce52d71653fd9b8c752d52d0b5b7e19042d8
> > > 100644
> > > --- a/drivers/net/netdevsim/bus.c
> > > +++ b/drivers/net/netdevsim/bus.c
> > > @@ -163,7 +163,9 @@ new_device_store(const struct bus_type *bus, const
> > > char *buf, size_t count)
> > > return -EINVAL;
> > > }
> > >
> > > - mutex_lock(&nsim_bus_dev_list_lock);
> > > + if (!mutex_trylock(&nsim_bus_dev_list_lock))
> > > + return restart_syscall();
> > > +
> > > /* Prevent to use resource before initialization. */
> > > if (!smp_load_acquire(&nsim_bus_enable)) {
> > > err = -EBUSY;
> > >
> > >
> > > >
> > > > Showing all locks held in the system:
> > ...
> > > > 4 locks held by syz-executor/9916:
> > > > #0: ffff88807ca86420 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:2930 [inline]
> > > > #0: ffff88807ca86420 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x224/0xc90 fs/read_write.c:679
> > > > #1: ffff88802e71e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x1ea/0x500 fs/kernfs/file.c:325
> > > > #2: ffff888144ff5968 (kn->active#50){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x20e/0x500 fs/kernfs/file.c:326
> > > > #3: ffffffff8f56d3e8 (nsim_bus_dev_list_lock){+.+.}-{3:3}, at: new_device_store+0x1b4/0x890 drivers/net/netdevsim/bus.c:166
> >
> > syz-executor/9916 is lock waiter, and
> >
> > > > 7 locks held by syz-executor/9976:
> > > > #0: ffff88807ca86420 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:2930 [inline]
> > > > #0: ffff88807ca86420 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x224/0xc90 fs/read_write.c:679
> > > > #1: ffff88807abc2888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x1ea/0x500 fs/kernfs/file.c:325
> > > > #2: ffff888144ff5a58 (kn->active#49){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x20e/0x500 fs/kernfs/file.c:326
> > > > #3: ffffffff8f56d3e8 (nsim_bus_dev_list_lock){+.+.}-{3:3}, at: del_device_store+0xfc/0x480 drivers/net/netdevsim/bus.c:216
> > > > #4: ffff888060f5a0e8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:1014 [inline]
> > > > #4: ffff888060f5a0e8 (&dev->mutex){....}-{3:3}, at: __device_driver_lock drivers/base/dd.c:1095 [inline]
> > > > #4: ffff888060f5a0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0xce/0x7c0 drivers/base/dd.c:1293
> > > > #5: ffff888060f5b250 (&devlink->lock_key#40){+.+.}-{3:3}, at: nsim_drv_remove+0x50/0x160 drivers/net/netdevsim/dev.c:1672
> > > > #6: ffffffff8fccdc48 (rtnl_mutex){+.+.}-{3:3}, at: nsim_destroy+0x71/0x5c0 drivers/net/netdevsim/netdev.c:773
> >
> > syz-executor/9976 is lock owner. Given both waiter and owner printed,
> > the proposed trylock looks like the typical paperover at least from a
> > hoofed skull because of no real deadlock detected.
>
> I suggest you look at why we have to use rtnl_trylock()
>
> If you know better, please send patches to remove all instances.
The real bug is that drivers/net/netdevsim uses sysfs to create and
delete network devices, this was a poor choice.
Powered by blists - more mailing lists