linux-kernel - Re: [V9fs-developer] [PATCH 0/5] Take 3 at async RPCs and no longer looping forever on signals

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZBb3ynXntaom/0hg@codewreck.org>
Date:   Sun, 19 Mar 2023 20:53:46 +0900
From:   Dominique Martinet <asmadeus@...ewreck.org>
To:     Christian Schoenebeck <linux_oss@...debyte.com>
Cc:     Jens Axboe <axboe@...nel.dk>, Latchesar Ionkov <lucho@...kov.net>,
        Eric Van Hensbergen <ericvh@...il.com>,
        linux-kernel@...r.kernel.org, Pengfei Xu <pengfei.xu@...el.com>,
        v9fs-developer@...ts.sourceforge.net
Subject: Re: [V9fs-developer] [PATCH 0/5] Take 3 at async RPCs and no longer
 looping forever on signals

It's been a while but I didn't forget...

Dominique Martinet wrote on Tue, Feb 14, 2023 at 08:16:38PM +0900:
> > Yes, apparently it tries to write dirty pages of the mapped file and keeps
> > hanging there [fs/9p/vfs_inode_dotl.c:586]:
> 
> Yeah, it'd help to get the trace of the thread actually trying to do the
> IO, if it still exists.
> I had some hangs in the check that there are no flush in flight at some
> point, and I thought I fixed that, but I can't really see anywhere else
> that'd start hanging with this... it'll be clearer if I can reproduce.

I couldn't reproduce this one, but manually inspecting
p9_client_wait_flush again I noticed the wait_event_interruptible was
waiting on req->flushed_req->wq but looking at req->status in the
condition; that was an error.
Also, we have a ref on req->flushed_req but not on req, so
req->flushed_req wasn't safe.

I've changed the code to add a variable directly on req->flushed_req and
use it consistently, I'm not sure that's the problem you ran into but it
might help.
It's been a while but do you remember if that hang was consistently
happening on shutdown, or was it a one time thing?

Either way, I'd appreciate if you could try my 9p-test branch again:
https://github.com/martinetd/linux/commits/9p-test

With that said, I expect that p9_client_wait_req will cause hangs on
broken servers.
If connection drops hopefully the reqs will just be marked as error and
free the thread, but I can see syzbot complaining about yet another
thread stuck.. Well it's interruptible at least, and bails out on
ERESTARTSYS.

> Anyway, I found another bug, just running ./configure on a random project
> (picked coreutils tarball) fails with interrupted system call ?!

That other bug was weird, I could reproduce it reliably until I rebooted
the host because of an unrelated nfs bug on the host, and after reboot I
couldn't reproduce anymore.
I'll chalk it down to buggy host/weird happenstance, but something to
watch for if random EINTR happen again :/

-- 
Dominique