[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20191205134007.605A44203F@d06av24.portsmouth.uk.ibm.com>
Date: Thu, 5 Dec 2019 19:10:06 +0530
From: Ritesh Harjani <riteshh@...ux.ibm.com>
To: Jan Kara <jack@...e.cz>
Cc: tytso@....edu, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org, mbobrowski@...browski.org,
joseph.qi@...ux.alibaba.com
Subject: Re: [PATCHv4 2/3] ext4: Start with shared i_rwsem in case of DIO
instead of exclusive
Hello Jan,
Thanks a lot for your reviews.
On 12/5/19 5:33 PM, Jan Kara wrote:
> On Thu 05-12-19 12:16:23, Ritesh Harjani wrote:
>> Earlier there was no shared lock in DIO read path. But this patch
>> (16c54688592ce: ext4: Allow parallel DIO reads)
>> simplified some of the locking mechanism while still allowing for parallel DIO
>> reads by adding shared lock in inode DIO read path.
>>
>> But this created problem with mixed read/write workload. It is due to the fact
>> that in DIO path, we first start with exclusive lock and only when we determine
>> that it is a ovewrite IO, we downgrade the lock. This causes the problem, since
>> we still have shared locking in DIO reads.
>>
>> So, this patch tries to fix this issue by starting with shared lock and then
>> switching to exclusive lock only when required based on ext4_dio_write_checks().
>>
>> Other than that, it also simplifies below cases:-
>>
>> 1. Simplified ext4_unaligned_aio API to ext4_unaligned_io. Previous API was
>> abused in the sense that it was not really checking for AIO anywhere also it
>> used to check for extending writes. So this API was renamed and simplified to
>> ext4_unaligned_io() which actully only checks if the IO is really unaligned.
>>
>> Now, in case of unaligned direct IO, iomap_dio_rw needs to do zeroing of partial
>> block and that will require serialization against other direct IOs in the same
>> block. So we take a exclusive inode lock for any unaligned DIO. In case of AIO
>> we also need to wait for any outstanding IOs to complete so that conversion from
>> unwritten to written is completed before anyone try to map the overlapping block.
>> Hence we take exclusive inode lock and also wait for inode_dio_wait() for
>> unaligned DIO case. Please note since we are anyway taking an exclusive lock in
>> unaligned IO, inode_dio_wait() becomes a no-op in case of non-AIO DIO.
>>
>> 2. Added ext4_extending_io(). This checks if the IO is extending the file.
>>
>> 3. Added ext4_dio_write_checks(). In this we start with shared inode lock and
>> only switch to exclusive lock if required. So in most cases with aligned,
>> non-extending, dioread_nolock & overwrites, it tries to write with a shared
>> lock. If not, then we restart the operation in ext4_dio_write_checks(), after
>> acquiring exclusive lock.
>>
>> Signed-off-by: Ritesh Harjani <riteshh@...ux.ibm.com>
>
> Cool, the patch looks good to me. You can add:
>
> Reviewed-by: Jan Kara <jack@...e.cz>
great!
>
> Two small nits below:
>
>> -static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
>> +static ssize_t ext4_generic_write_checks(struct kiocb *iocb,
>> + struct iov_iter *from)
>> {
>> struct inode *inode = file_inode(iocb->ki_filp);
>> ssize_t ret;
>> @@ -228,11 +235,21 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
>> iov_iter_truncate(from, sbi->s_bitmap_maxbytes - iocb->ki_pos);
>> }
>>
>> + return iov_iter_count(from);
>> +}
>
> You return iov_iter_count() from ext4_generic_write_checks()...
>
>> +static ssize_t ext4_dio_write_checks(struct kiocb *iocb, struct iov_iter *from,
>> + bool *ilock_shared, bool *extend)
>> +{
>> + struct file *file = iocb->ki_filp;
>> + struct inode *inode = file_inode(file);
>> + loff_t offset;
>> + size_t count;
>> + ssize_t ret;
>> +
>> +restart:
>> + ret = ext4_generic_write_checks(iocb, from);
>> + if (ret <= 0)
>> + goto out;
>> +
>> + offset = iocb->ki_pos;
>> + count = iov_iter_count(from);
>
> But you don't use the returned count here and just call iov_iter_count()
> again (which is cheap anyway but still it's strange).
Yes. iov_iter_count() (as you also said) is anyway a inline function
which only does from->count, which comes at no cost.
But re-assigning a ssize_t value to size_t is something I was getting
uncomfortable with. Although I agree that it should be completely fine
here, I just was not convinced to use that instead of directly accessing
it from iov_iter_count() for better readability reasons.
But unless you feel otherwise, I could make those changes at 2 places
which you mentioned.
>
>> + if (ext4_extending_io(inode, offset, count))
>> + *extend = true;
>> + /*
>> + * Determine whether the IO operation will overwrite allocated
>> + * and initialized blocks. If so, check to see whether it is
>> + * possible to take the dioread_nolock path.
>> + *
>> + * We need exclusive i_rwsem for changing security info
>> + * in file_modified().
>> + */
>> + if (*ilock_shared && (!IS_NOSEC(inode) || *extend ||
>> + !ext4_should_dioread_nolock(inode) ||
>> + !ext4_overwrite_io(inode, offset, count))) {
>> + inode_unlock_shared(inode);
>> + *ilock_shared = false;
>> + inode_lock(inode);
>> + goto restart;
>> + }
>> +
>> + ret = file_modified(file);
>> + if (ret < 0)
>> + goto out;
>> +
>> + return count;
>
> And then you return count from ext4_dio_write_checks() here...
ditto
>
>> - ret = ext4_write_checks(iocb, from);
>> - if (ret <= 0) {
>> - inode_unlock(inode);
>> + ret = ext4_dio_write_checks(iocb, from, &ilock_shared, &extend);
>> + if (ret <= 0)
>> return ret;
>> - }
>>
>> - /*
>> - * Unaligned asynchronous direct I/O must be serialized among each
>> - * other as the zeroing of partial blocks of two competing unaligned
>> - * asynchronous direct I/O writes can result in data corruption.
>> - */
>> offset = iocb->ki_pos;
>> count = iov_iter_count(from);
>
> And then again just don't use the value here...
ditto
>
> Honza
>
Powered by blists - more mailing lists