lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240827123919.GN3468552@ziepe.ca>
Date: Tue, 27 Aug 2024 09:39:19 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: syzbot <syzbot+4d0c396361b5dc5d610f@...kaller.appspotmail.com>
Cc: leon@...nel.org, linux-kernel@...r.kernel.org,
	linux-rdma@...r.kernel.org, netdev@...r.kernel.org,
	syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [rdma?] INFO: task hung in disable_device

On Mon, Aug 26, 2024 at 08:28:29PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    872cf28b8df9 Merge tag 'platform-drivers-x86-v6.11-4' of g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=138e4ff5980000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=df2f0ed7e30a639d
> dashboard link: https://syzkaller.appspot.com/bug?extid=4d0c396361b5dc5d610f
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> 
> Unfortunately, I don't have any reproducer for this issue yet.

And the console output is really hard to understand how we got here.

There are no syz commands that seem to have anything to do with rdma
or ib at all, yet somehow a rdma device (rxe/siw?) was created and
destroyed.

The console output format has changed, has something gone wrong with
this? Usually I would expect the last "executing program" to be a
netlink operation triggering device unregister...

> Workqueue: ib-unreg-wq ib_unregister_work
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5188 [inline]
>  __schedule+0x1800/0x4a60 kernel/sched/core.c:6529
>  __schedule_loop kernel/sched/core.c:6606 [inline]
>  schedule+0x14b/0x320 kernel/sched/core.c:6621
>  schedule_timeout+0xb0/0x310 kernel/time/timer.c:2557
>  do_wait_for_common kernel/sched/completion.c:95 [inline]
>  __wait_for_common kernel/sched/completion.c:116 [inline]
>  wait_for_common kernel/sched/completion.c:127 [inline]
>  wait_for_completion+0x355/0x620 kernel/sched/completion.c:148
>  disable_device+0x1c7/0x360 drivers/infiniband/core/device.c:1295
>  __ib_unregister_device+0x2ac/0x3c0 drivers/infiniband/core/device.c:1493
>  ib_unregister_work+0x19/0x30 drivers/infiniband/core/device.c:1604
>  process_one_work kernel/workqueue.c:3231 [inline]
>  process_scheduled_works+0xa2e/0x1830 kernel/workqueue.c:3312
>  worker_thread+0x86d/0xd40 kernel/workqueue.c:3390
>  kthread+0x2f2/0x390 kernel/kthread.c:389
>  ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>  </TASK>

This trace is almost certainly a refcount leak. Presumably on some
error path.

Without a reproducer, or some kind of clue what syzkaller was doing it
doesn't seem like any progress is possible.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ