linux-kernel - Re: [syzbot] [nfs?] INFO: task hung in nfsd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <172039726840.11489.12386749198888516742@noble.neil.brown.name>
Date: Mon, 08 Jul 2024 10:07:48 +1000
From: "NeilBrown" <neilb@...e.de>
To: "syzbot" <syzbot+b568ba42c85a332a88ee@...kaller.appspotmail.com>
Cc: "Jeff Layton" <jlayton@...nel.org>, Dai.Ngo@...cle.com,
 chuck.lever@...cle.com, kolga@...app.com, linux-kernel@...r.kernel.org,
 linux-nfs@...r.kernel.org, syzkaller-bugs@...glegroups.com, tom@...pey.com
Subject: Re: [syzbot] [nfs?] INFO: task hung in nfsd_umount

On Sun, 07 Jul 2024, Jeff Layton wrote:
> On Sat, 2024-07-06 at 21:37 -0700, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following issue on:
> > 
> > HEAD commit:    1dd28064d416 Merge tag 'integrity-v6.10-fix' of ssh://ra.k..
...
> > 2 locks held by syz.4.2691/12623:
> >  #0: ffffffff8f63ad70 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
> >  #1: ffffffff8e5fff48 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1940
> > 2 locks held by syz.0.2871/13167:
> >  #0: ffff88804e1920e0 (&type->s_umount_key#77){++++}-{3:3}, at: __super_lock fs/super.c:56 [inline]
> >  #0: ffff88804e1920e0 (&type->s_umount_key#77){++++}-{3:3}, at: __super_lock_excl fs/super.c:71 [inline]
> >  #0: ffff88804e1920e0 (&type->s_umount_key#77){++++}-{3:3}, at: deactivate_super+0xb5/0xf0 fs/super.c:505
> >  #1: ffffffff8e5fff48 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_shutdown_threads+0x4e/0xd0 fs/nfsd/nfssvc.c:632
> 
> Above there are two tasks that are holding/blocked on the nfsd_mutex.
> The first is the stuck thread. The second is a task that took it in
> nfsd_nl_listener_set_doit. I don't see a way to exit that function
> without releasing that mutex, so it seems likely that thread is stuck
> too for some reason. Unfortunately, we don't have a stack trace from
> that task, so I can't tell what it's doing.
> 

We can guess though.  It isn't waiting for a lock - that would show in
the above list - so it might be waiting for a wakeup, or might be
spinning.
The only wake-up I can imagine is in one of the memory-allocation calls,
but if the system were running out of memory we would probably see
messages about that.

I wonder if it could be looping in svc_xprt_destroy_all(), and sitting
in the msleep() when the hang is detected so there are no locks to
report.   I can't see while it would block there.

It would really help to get a full task list.
There is a sysctl for that: /proc/sys/kernel/hung_task_all_cpu_backtrace

Could that be enabled?

NeilBrown