linux-kernel - Re: [PATCH v3 1/1] FUSE: Allow non-extending parallel direct writes on the same file.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <77b28772-b210-6597-eedd-fe7e0b5b1db9@ddn.com>
Date:   Wed, 25 May 2022 21:06:37 +0200
From:   Bernd Schubert <bschubert@....com>
To:     Vivek Goyal <vgoyal@...hat.com>,
        Dharmendra Singh <dharamhans87@...il.com>
Cc:     miklos@...redi.hu, linux-fsdevel@...r.kernel.org,
        fuse-devel@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
        Dharmendra Singh <dsingh@....com>
Subject: Re: [PATCH v3 1/1] FUSE: Allow non-extending parallel direct writes
 on the same file.



On 5/25/22 20:41, Vivek Goyal wrote:
> On Fri, May 20, 2022 at 10:04:43AM +0530, Dharmendra Singh wrote:
>> From: Dharmendra Singh <dsingh@....com>
>>
>> In general, as of now, in FUSE, direct writes on the same file are
>> serialized over inode lock i.e we hold inode lock for the full duration
>> of the write request. I could not found in fuse code a comment which
>> clearly explains why this exclusive lock is taken for direct writes.
>> Our guess is some USER space fuse implementations might be relying
>> on this lock for seralization and also it protects for the issues
>> arising due to file size assumption or write failures.  This patch
>> relaxes this exclusive lock in some cases of direct writes.
>>
>> With these changes, we allows non-extending parallel direct writes
>> on the same file with the help of a flag called FOPEN_PARALLEL_WRITES.
>> If this flag is set on the file (flag is passed from libfuse to fuse
>> kernel as part of file open/create), we do not take exclusive lock instead
>> use shared lock so that all non-extending writes can run in parallel.
>>
>> Best practise would be to enable parallel direct writes of all kinds
>> including extending writes as well but we see some issues such as
>> when one write completes and other fails, how we should truncate(if
>> needed) the file if underlying file system does not support holes
>> (For file systems which supports holes, there might be a possibility
>> of enabling parallel writes for all cases).
>>
>> FUSE implementations which rely on this inode lock for serialisation
>> can continue to do so and this is default behaviour i.e no parallel
>> direct writes.
>>
>> Signed-off-by: Dharmendra Singh <dsingh@....com>
>> Signed-off-by: Bernd Schubert <bschubert@....com>
>> ---
>>   fs/fuse/file.c            | 33 ++++++++++++++++++++++++++++++---
>>   include/uapi/linux/fuse.h |  2 ++
>>   2 files changed, 32 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index 829094451774..1a93fd80a6ce 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -1541,14 +1541,37 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>   	return res;
>>   }
>>   
>> +static bool fuse_direct_write_extending_i_size(struct kiocb *iocb,
>> +					       struct iov_iter *iter)
>> +{
>> +	struct inode *inode = file_inode(iocb->ki_filp);
>> +
>> +	return (iocb->ki_flags & IOCB_APPEND ||
>> +		iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode));
>> +}
> 
> Hi Dharmendra,
> 
> I have a question. What makes i_size stable. This is being read outside
> the inode_lock(). Can it race with truncate. I mean we checked
> i_size and decided to take shared lock. In the mean time another thread
> truncated the file and now our decision to take shared lock is wrong
> as file will be extended due to direct write?

Oh right, good catch! I guess we need to take a shared lock first, read 
the size and if it is an extending lock need to unlock/switch to an 
excluding lock. Theoretically could be a loop, but I guess that would be 
overkill.


Thanks for your review!

Cheers,
Bernd