lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 2 Nov 2018 23:45:38 +0100
From:   Dominique Martinet <asmadeus@...ewreck.org>
To:     Dmitry Vyukov <dvyukov@...gle.com>
Cc:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Eric Van Hensbergen <ericvh@...il.com>,
        Ron Minnich <rminnich@...dia.gov>,
        Latchesar Ionkov <lucho@...kov.net>,
        v9fs-developer@...ts.sourceforge.net,
        syzbot <syzbot+f425456ea8aa16b40d20@...kaller.appspotmail.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Al Viro <viro@...iv.linux.org.uk>
Subject: Re: INFO: task hung in grab_super

Dmitry Vyukov wrote on Fri, Nov 02, 2018:
> >> I guess that's the problem, right? SIGKILL-ed task must not ignore
> >> SIGKILL and hang in infinite loop. This would explain a bunch of hangs
> >> in 9p.
> >
> > Did you check /proc/18253/task/*/stack after manually sending SIGKILL?
> 
> Yes:
> 
> root@...kaller:~# ps afxu | grep syz
> root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
> [syz-executor] <defunct>
> root@...kaller:~# cat /proc/18253/task/*/stack
> [<0>] p9_client_rpc+0x3a2/0x1400
> [<0>] p9_client_flush+0x134/0x2a0
> [<0>] p9_client_rpc+0x122c/0x1400
> [<0>] p9_client_create+0xc56/0x16af
> [<0>] v9fs_session_init+0x21a/0x1a80
> [<0>] v9fs_mount+0x7c/0x900
> [<0>] mount_fs+0xae/0x328
> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
> [<0>] do_mount+0x581/0x30e0
> [<0>] ksys_mount+0x12d/0x140
> [<0>] __x64_sys_mount+0xbe/0x150
> [<0>] do_syscall_64+0x1b9/0x820
> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [<0>] 0xffffffffffffffff

Yes that's a known problem with the current code, since everything must
be cleaned up on the spot, the first kill sends a flush and waits again
for the flush reply to come; the second kill is completly ignored.

With the refcounting work we've done that went in this merge window
we're halfways there - memory can now have a lifetime independant of the
current request and won't be freed when the process exits p9_client_rpc,
so we can send the flush and return immediately; then have the rest of
the cleanup happen asynchronously when the flush reply comes or the
client is torn down, whichever happens first.

I've got this planned for 4.21 if I can find the time to do it early in
this cycle and I get it to work on first try, 4.22 if I run into
complications to make sure it's well tested in -next first.
My freetime is pretty limited this year so unless you want to help it'll
get done when it's ready :)

-- 
Dominique

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ