linux-kernel - Re: [PATCH 4/5] 9p: introduce async read requests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 12 Dec 2016 17:15:55 -0800 (PST)
From:   Stefano Stabellini <sstabellini@...nel.org>
To:     Al Viro <viro@...IV.linux.org.uk>
cc:     Stefano Stabellini <sstabellini@...nel.org>,
        v9fs-developer@...ts.sourceforge.net, ericvh@...il.com,
        rminnich@...dia.gov, lucho@...kov.net, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/5] 9p: introduce async read requests

Hi Al,
thanks for looking at the patch.

On Sat, 10 Dec 2016, Al Viro wrote:
> On Thu, Dec 08, 2016 at 12:59:05PM -0800, Stefano Stabellini wrote:
> 
> 
> > +		} else {
> > +			req = p9_client_get_req(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize);
> > +			if (IS_ERR(req)) {
> > +				*err = PTR_ERR(req);
> > +				break;
> > +			}
> > +			req->rsize = iov_iter_get_pages_alloc(to, &req->pagevec, 
> > +					(size_t)rsize, &req->offset);
> > +			req->kiocb = iocb;
> > +			for (i = 0; i < req->rsize; i += PAGE_SIZE)
> > +				page_cache_get_speculative(req->pagevec[i/PAGE_SIZE]);
> > +			req->callback = p9_client_read_complete;
> > +
> > +			*err = clnt->trans_mod->request(clnt, req);
> > +			if (*err < 0) {
> > +				clnt->status = Disconnected;
> > +				release_pages(req->pagevec,
> > +						(req->rsize + PAGE_SIZE - 1) / PAGE_SIZE,
> > +						true);
> > +				kvfree(req->pagevec);
> > +				p9_free_req(clnt, req);
> > +				break;
> > +			}
> > +
> > +			*err = -EIOCBQUEUED;
> 
> IDGI.  AFAICS, your code will result in shitloads of short reads - every
> time when you give it a multi-iovec array, only the first one will be
> issued and the rest won't be even looked at.  Sure, it is technically
> legal, but I very much doubt that aio users will be happy with that.
> 
> What am I missing here?

There is a problem with short reads, but I don't think it is the one
you described, unless I made a mistake somewhere.

The code above will issue one request as big as possible (size is
limited by clnt->msize, which is the size of the channel). No matter how
many segs in the iov_iter, the request will cover the maximum amount of
bytes allowed by the channel, typically 4K but can be larger. So
multi-iovec arrays should be handled correctly up to msize. Please let
me know if I am wrong, I am not an expert on this. I tried, but couldn't
actually get any multi-iovec arrays in my tests.

That said, reading the code again, there are indeed two possible causes
of short reads. The first one are responses which have a smaller byte
count than requested. Currently this case is not handled, forwarding
the short read up to the user. But I wrote and tested successfully a
patch that issues another follow-up request from the completion
function. Example:
  p9_client_read, issue request, size 4K
  p9_client_read_complete, receive response, count 2K
  p9_client_read_complete, issue request size 4K-2K = 2K
  p9_client_read_complete, receive response size 2K
  p9_client_read_complete, call ki_complete

The second possible cause of short reads are requests which are bigger
than msize. For example a 2MB read over a 4K channel. In this patch we
simply issue one request as big as msize, and eventually return to the
caller a smaller byte count. One option is to simply fall back to sync
IO in these cases. Another solution is to use the same technique
described above: we issue the first request as big as msize, then, from
the callback function, we issue a follow-up request, and again, and
again, until we fully complete the large read. This way, although we are
not issuing any simultaneous requests for a specific large read, at
least we can issue other aio requests in parallel for other reads or
writes. It should still be more performant than sync IO.

I am thinking of implementing this second option in the next version of
the series.