[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20181102224538.GB9565@nautica>
Date: Fri, 2 Nov 2018 23:45:38 +0100
From: Dominique Martinet <asmadeus@...ewreck.org>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Eric Van Hensbergen <ericvh@...il.com>,
Ron Minnich <rminnich@...dia.gov>,
Latchesar Ionkov <lucho@...kov.net>,
v9fs-developer@...ts.sourceforge.net,
syzbot <syzbot+f425456ea8aa16b40d20@...kaller.appspotmail.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
Al Viro <viro@...iv.linux.org.uk>
Subject: Re: INFO: task hung in grab_super
Dmitry Vyukov wrote on Fri, Nov 02, 2018:
> >> I guess that's the problem, right? SIGKILL-ed task must not ignore
> >> SIGKILL and hang in infinite loop. This would explain a bunch of hangs
> >> in 9p.
> >
> > Did you check /proc/18253/task/*/stack after manually sending SIGKILL?
>
> Yes:
>
> root@...kaller:~# ps afxu | grep syz
> root 18253 0.0 0.0 0 0 ttyS0 Zl 10:16 0:00 \_
> [syz-executor] <defunct>
> root@...kaller:~# cat /proc/18253/task/*/stack
> [<0>] p9_client_rpc+0x3a2/0x1400
> [<0>] p9_client_flush+0x134/0x2a0
> [<0>] p9_client_rpc+0x122c/0x1400
> [<0>] p9_client_create+0xc56/0x16af
> [<0>] v9fs_session_init+0x21a/0x1a80
> [<0>] v9fs_mount+0x7c/0x900
> [<0>] mount_fs+0xae/0x328
> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
> [<0>] do_mount+0x581/0x30e0
> [<0>] ksys_mount+0x12d/0x140
> [<0>] __x64_sys_mount+0xbe/0x150
> [<0>] do_syscall_64+0x1b9/0x820
> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [<0>] 0xffffffffffffffff
Yes that's a known problem with the current code, since everything must
be cleaned up on the spot, the first kill sends a flush and waits again
for the flush reply to come; the second kill is completly ignored.
With the refcounting work we've done that went in this merge window
we're halfways there - memory can now have a lifetime independant of the
current request and won't be freed when the process exits p9_client_rpc,
so we can send the flush and return immediately; then have the rest of
the cleanup happen asynchronously when the flush reply comes or the
client is torn down, whichever happens first.
I've got this planned for 4.21 if I can find the time to do it early in
this cycle and I get it to work on first try, 4.22 if I run into
complications to make sure it's well tested in -next first.
My freetime is pretty limited this year so unless you want to help it'll
get done when it's ready :)
--
Dominique
Powered by blists - more mailing lists