linux-kernel - Re: [PATCH 2/6] fuse: add support of async IO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <51767CD0.40507@parallels.com>
Date:	Tue, 23 Apr 2013 16:21:36 +0400
From:	"Maxim V. Patlasov" <mpatlasov@...allels.com>
To:	Miklos Szeredi <miklos@...redi.hu>
CC:	<dev@...allels.com>, <xemul@...allels.com>,
	<fuse-devel@...ts.sourceforge.net>, <bfoster@...hat.com>,
	<linux-kernel@...r.kernel.org>, <devel@...nvz.org>
Subject: Re: [PATCH 2/6] fuse: add support of async IO

Hi Miklos,

04/22/2013 08:34 PM, Miklos Szeredi пишет:
> On Fri, Dec 14, 2012 at 07:20:41PM +0400, Maxim V. Patlasov wrote:
>> The patch implements a framework to process an IO request asynchronously. The
>> idea is to associate several fuse requests with a single kiocb by means of
>> fuse_io_priv structure. The structure plays the same role for FUSE as 'struct
>> dio' for direct-io.c.
>>
>> The framework is supposed to be used like this:
>>   - someone (who wants to process an IO asynchronously) allocates fuse_io_priv
>>     and initializes it setting 'async' field to non-zero value.
>>   - as soon as fuse request is filled, it can be submitted (in non-blocking way)
>>     by fuse_async_req_send()
>>   - when all submitted requests are ACKed by userspace, io->reqs drops to zero
>>     triggering aio_complete()
>>
>> In case of IO initiated by libaio, aio_complete() will finish processing the
>> same way as in case of dio_complete() calling aio_complete(). But the
>> framework may be also used for internal FUSE use when initial IO request
>> was synchronous (from user perspective), but it's beneficial to process it
>> asynchronously. Then the caller should wait on kiocb explicitly and
>> aio_complete() will wake the caller up.
>>
>> Signed-off-by: Maxim Patlasov <mpatlasov@...allels.com>
>> ---
>>   fs/fuse/file.c   |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/fuse/fuse_i.h |   17 ++++++++++
>>   2 files changed, 109 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index 6685cb0..8dd931f 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -503,6 +503,98 @@ static void fuse_release_user_pages(struct fuse_req *req, int write)
>>   	}
>>   }
>>   
>> +/**
>> + * In case of short read, the caller sets 'pos' to the position of
>> + * actual end of fuse request in IO request. Otherwise, if bytes_requested
>> + * == bytes_transferred or rw == WRITE, the caller sets 'pos' to -1.
>> + *
>> + * An example:
>> + * User requested DIO read of 64K. It was splitted into two 32K fuse requests,
>> + * both submitted asynchronously. The first of them was ACKed by userspace as
>> + * fully completed (req->out.args[0].size == 32K) resulting in pos == -1. The
>> + * second request was ACKed as short, e.g. only 1K was read, resulting in
>> + * pos == 33K.
>> + *
>> + * Thus, when all fuse requests are completed, the minimal non-negative 'pos'
>> + * will be equal to the length of the longest contiguous fragment of
>> + * transferred data starting from the beginning of IO request.
>> + */
>> +static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos)
>> +{
>> +	int left;
>> +
>> +	spin_lock(&io->lock);
>> +	if (err)
>> +		io->err = io->err ? : err;
>> +	else if (pos >= 0 && (io->bytes < 0 || pos < io->bytes))
>> +		io->bytes = pos;
>> +
>> +	left = --io->reqs;
>> +	spin_unlock(&io->lock);
>> +
>> +	if (!left) {
>> +		long res;
>> +
>> +		if (io->err)
>> +			res = io->err;
>> +		else if (io->bytes >= 0 && io->write)
>> +			res = -EIO;
>> +		else {
>> +			res = io->bytes < 0 ? io->size : io->bytes;
>> +
>> +			if (!is_sync_kiocb(io->iocb)) {
>> +				struct path *path = &io->iocb->ki_filp->f_path;
>> +				struct inode *inode = path->dentry->d_inode;
>> +				struct fuse_conn *fc = get_fuse_conn(inode);
>> +				struct fuse_inode *fi = get_fuse_inode(inode);
>> +
>> +				spin_lock(&fc->lock);
>> +				fi->attr_version = ++fc->attr_version;
>> +				spin_unlock(&fc->lock);
> Hmm, what is this?  Incrementing the attr version without setting any attributes
> doesn't make sense.

It makes sense at least for writes. __fuse_direct_write() always called 
fuse_write_update_size() and the latter always incremented attr_version, 
even if *ppos <= inode->i_size. I believed it was implemented in this 
way intentionally: if write succeeded, the file is changed on server, 
hence attrs requested from server early should be regarded as stale.

Adding async IO support to fuse, a case emerges when 
fuse_write_update_size() won't be called: incoming direct IO write is 
asynchronous (e.g. it came from libaio), it's not extending write, so 
it's allowable to process it by submitting fuse requests to background 
and return -EIOCBQUEUED without waiting for completions (see 4th patch 
of this patch-set). But in this case the file on server will be changed 
anyway. That's why I bump attr_version in fuse_aio_complete() -- to be 
consistent with the model we had before this patch-set.

The fact that I did the trick both for writes and reads was probably 
overlook. I'd suggest to fix it like this:

> -			if (!is_sync_kiocb(io->iocb)) {
> +			if (!is_sync_kiocb(io->iocb) && io->write) {

Thanks,
Maxim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/