[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240827123919.GN3468552@ziepe.ca>
Date: Tue, 27 Aug 2024 09:39:19 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: syzbot <syzbot+4d0c396361b5dc5d610f@...kaller.appspotmail.com>
Cc: leon@...nel.org, linux-kernel@...r.kernel.org,
linux-rdma@...r.kernel.org, netdev@...r.kernel.org,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [rdma?] INFO: task hung in disable_device
On Mon, Aug 26, 2024 at 08:28:29PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 872cf28b8df9 Merge tag 'platform-drivers-x86-v6.11-4' of g..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=138e4ff5980000
> kernel config: https://syzkaller.appspot.com/x/.config?x=df2f0ed7e30a639d
> dashboard link: https://syzkaller.appspot.com/bug?extid=4d0c396361b5dc5d610f
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>
> Unfortunately, I don't have any reproducer for this issue yet.
And the console output is really hard to understand how we got here.
There are no syz commands that seem to have anything to do with rdma
or ib at all, yet somehow a rdma device (rxe/siw?) was created and
destroyed.
The console output format has changed, has something gone wrong with
this? Usually I would expect the last "executing program" to be a
netlink operation triggering device unregister...
> Workqueue: ib-unreg-wq ib_unregister_work
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5188 [inline]
> __schedule+0x1800/0x4a60 kernel/sched/core.c:6529
> __schedule_loop kernel/sched/core.c:6606 [inline]
> schedule+0x14b/0x320 kernel/sched/core.c:6621
> schedule_timeout+0xb0/0x310 kernel/time/timer.c:2557
> do_wait_for_common kernel/sched/completion.c:95 [inline]
> __wait_for_common kernel/sched/completion.c:116 [inline]
> wait_for_common kernel/sched/completion.c:127 [inline]
> wait_for_completion+0x355/0x620 kernel/sched/completion.c:148
> disable_device+0x1c7/0x360 drivers/infiniband/core/device.c:1295
> __ib_unregister_device+0x2ac/0x3c0 drivers/infiniband/core/device.c:1493
> ib_unregister_work+0x19/0x30 drivers/infiniband/core/device.c:1604
> process_one_work kernel/workqueue.c:3231 [inline]
> process_scheduled_works+0xa2e/0x1830 kernel/workqueue.c:3312
> worker_thread+0x86d/0xd40 kernel/workqueue.c:3390
> kthread+0x2f2/0x390 kernel/kthread.c:389
> ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> </TASK>
This trace is almost certainly a refcount leak. Presumably on some
error path.
Without a reproducer, or some kind of clue what syzkaller was doing it
doesn't seem like any progress is possible.
Jason
Powered by blists - more mailing lists