[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130725190506.GA32375@nautica>
Date: Thu, 25 Jul 2013 21:05:06 +0200
From: Dominique Martinet <dominique.martinet@....fr>
To: Eric Van Hensbergen <ericvh@...il.com>
Cc: Dominique Martinet <dominique.martinet@....fr>,
Latchesar Ionkov <lucho@...kov.net>, pebolle@...cali.nl,
netdev@...r.kernel.org,
linux-kernel <linux-kernel@...r.kernel.org>, andi@...zian.org,
rminnich@...dia.gov,
V9FS Developers <v9fs-developer@...ts.sourceforge.net>,
David Miller <davem@...emloft.net>
Subject: Re: [V9fs-developer] [PATCH] net: trans_rdma: remove unused
function
Eric Van Hensbergen wrote on Thu, Jul 25, 2013 :
> So, the cancel function should be used to flush any pending requests that
> haven't actually been sent yet. Looking at the 9p RDMA code, it looks like
> the thought was that this wasn't going to be possible. Regardless of
> removing unsent requests, the flush will still be sent and if the server
> processes it before the original request and sends a flush response back
> then we need to clear the posted buffer. This is what rdma_cancelled is
> supposed to be doing. So, the fix is to hook it into the structure -- but
> looking at the code it seems like we probably need to do something more to
> reclaim the buffer rather than just incrementing a counter.
>
> To be clear this has less to do with recovery and more to do with the
> proper implementation of 9p flush semantics. By and large, those semantics
> won't impact static file system users -- but if anyone is using the
> transport to access synthetic filesystems or files then they'll definitely
> want to have a properly implemented flush setup. The way to test this is
> to get a blocking read on a remote named pipe or fifo and then ^C it.
Ok, I knew about the concept of flush but didn't think a ^C would cause
a -ESYSRESTART, so didn't think of that.
That said, reading from, say, a fifo is an entierly local operation: the
client does a walk, getattr, doesn't do anything 9p-wise, and clunks
when it's done with it.
As for the function needing a bit more work, there's a race, but on
"normal" requests I think it is about right - the answer lays in a
comment in rdma_request:
/* When an error occurs between posting the recv and the send,
* there will be a receive context posted without a pending request.
* Since there is no way to "un-post" it, we remember it and skip
* post_recv() for the next request.
* So here,
* see if we are this `next request' and need to absorb an excess rc.
* If yes, then drop and free our own, and do not recv_post().
**/
Basically, receive buffers are sent in a queue, and we can't "retrieve"
it back, so we just don't sent next one.
There is one problem though - if the server handles the original request
before getting the flush, the receive buffer will be consumed and we
won't send a new one, so we'll starve the reception queue.
I'm afraid I don't have any bright idea there...
While we are on reception buffer issues, there is another problem with
the queue of receive buffers, even without flush, in the following
scenario:
- post a buffer for tag 0, on a hanging request
- post a buffer for tag 1
- reply for tag 1 will come on buffer from tag 0
- post another request with tag 1.. the buffer already is in the queue,
and we don't know we can post the buffer associated with tag 0 back.
I haven't found how to reproduce this perfectly yet, but a dd with
blocksize 1MB and one with blocksize 10B in parallel brought the
mountpoint down (and the whole server was completely unavailable for the
duration of the dd - TCP sessions timed out, I even got IO errors on the
local disk :D)
Regards,
--
Dominique Martinet
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists