[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBb3ynXntaom/0hg@codewreck.org>
Date: Sun, 19 Mar 2023 20:53:46 +0900
From: Dominique Martinet <asmadeus@...ewreck.org>
To: Christian Schoenebeck <linux_oss@...debyte.com>
Cc: Jens Axboe <axboe@...nel.dk>, Latchesar Ionkov <lucho@...kov.net>,
Eric Van Hensbergen <ericvh@...il.com>,
linux-kernel@...r.kernel.org, Pengfei Xu <pengfei.xu@...el.com>,
v9fs-developer@...ts.sourceforge.net
Subject: Re: [V9fs-developer] [PATCH 0/5] Take 3 at async RPCs and no longer
looping forever on signals
It's been a while but I didn't forget...
Dominique Martinet wrote on Tue, Feb 14, 2023 at 08:16:38PM +0900:
> > Yes, apparently it tries to write dirty pages of the mapped file and keeps
> > hanging there [fs/9p/vfs_inode_dotl.c:586]:
>
> Yeah, it'd help to get the trace of the thread actually trying to do the
> IO, if it still exists.
> I had some hangs in the check that there are no flush in flight at some
> point, and I thought I fixed that, but I can't really see anywhere else
> that'd start hanging with this... it'll be clearer if I can reproduce.
I couldn't reproduce this one, but manually inspecting
p9_client_wait_flush again I noticed the wait_event_interruptible was
waiting on req->flushed_req->wq but looking at req->status in the
condition; that was an error.
Also, we have a ref on req->flushed_req but not on req, so
req->flushed_req wasn't safe.
I've changed the code to add a variable directly on req->flushed_req and
use it consistently, I'm not sure that's the problem you ran into but it
might help.
It's been a while but do you remember if that hang was consistently
happening on shutdown, or was it a one time thing?
Either way, I'd appreciate if you could try my 9p-test branch again:
https://github.com/martinetd/linux/commits/9p-test
With that said, I expect that p9_client_wait_req will cause hangs on
broken servers.
If connection drops hopefully the reqs will just be marked as error and
free the thread, but I can see syzbot complaining about yet another
thread stuck.. Well it's interruptible at least, and bails out on
ERESTARTSYS.
> Anyway, I found another bug, just running ./configure on a random project
> (picked coreutils tarball) fails with interrupted system call ?!
That other bug was weird, I could reproduce it reliably until I rebooted
the host because of an unrelated nfs bug on the host, and after reboot I
couldn't reproduce anymore.
I'll chalk it down to buggy host/weird happenstance, but something to
watch for if random EINTR happen again :/
--
Dominique
Powered by blists - more mailing lists