linux-kernel - Re: INFO: task hung in grab

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20181102224538.GB9565@nautica>
Date:   Fri, 2 Nov 2018 23:45:38 +0100
From:   Dominique Martinet <asmadeus@...ewreck.org>
To:     Dmitry Vyukov <dvyukov@...gle.com>
Cc:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Eric Van Hensbergen <ericvh@...il.com>,
        Ron Minnich <rminnich@...dia.gov>,
        Latchesar Ionkov <lucho@...kov.net>,
        v9fs-developer@...ts.sourceforge.net,
        syzbot <syzbot+f425456ea8aa16b40d20@...kaller.appspotmail.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Al Viro <viro@...iv.linux.org.uk>
Subject: Re: INFO: task hung in grab_super

Dmitry Vyukov wrote on Fri, Nov 02, 2018:
> >> I guess that's the problem, right? SIGKILL-ed task must not ignore
> >> SIGKILL and hang in infinite loop. This would explain a bunch of hangs
> >> in 9p.
> >
> > Did you check /proc/18253/task/*/stack after manually sending SIGKILL?
> 
> Yes:
> 
> root@...kaller:~# ps afxu | grep syz
> root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
> [syz-executor] <defunct>
> root@...kaller:~# cat /proc/18253/task/*/stack
> [<0>] p9_client_rpc+0x3a2/0x1400
> [<0>] p9_client_flush+0x134/0x2a0
> [<0>] p9_client_rpc+0x122c/0x1400
> [<0>] p9_client_create+0xc56/0x16af
> [<0>] v9fs_session_init+0x21a/0x1a80
> [<0>] v9fs_mount+0x7c/0x900
> [<0>] mount_fs+0xae/0x328
> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
> [<0>] do_mount+0x581/0x30e0
> [<0>] ksys_mount+0x12d/0x140
> [<0>] __x64_sys_mount+0xbe/0x150
> [<0>] do_syscall_64+0x1b9/0x820
> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [<0>] 0xffffffffffffffff

Yes that's a known problem with the current code, since everything must
be cleaned up on the spot, the first kill sends a flush and waits again
for the flush reply to come; the second kill is completly ignored.

With the refcounting work we've done that went in this merge window
we're halfways there - memory can now have a lifetime independant of the
current request and won't be freed when the process exits p9_client_rpc,
so we can send the flush and return immediately; then have the rest of
the cleanup happen asynchronously when the flush reply comes or the
client is torn down, whichever happens first.

I've got this planned for 4.21 if I can find the time to do it early in
this cycle and I get it to work on first try, 4.22 if I run into
complications to make sure it's well tested in -next first.
My freetime is pretty limited this year so unless you want to help it'll
get done when it's ready :)

-- 
Dominique