[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4adbf8aa-e417-1997-c83d-90e7623f2916@huaweicloud.com>
Date: Mon, 6 May 2024 19:21:51 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Dave Chinner <david@...morbit.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, tytso@....edu,
adilger.kernel@...ger.ca, jack@...e.cz, ritesh.list@...il.com,
hch@...radead.org, djwong@...nel.org, willy@...radead.org,
zokeefe@...gle.com, yi.zhang@...wei.com, chengzhihao1@...wei.com,
yukuai3@...wei.com, wangkefeng.wang@...wei.com
Subject: Re: [RFC PATCH v4 24/34] ext4: implement buffered write iomap path
On 2024/5/1 16:11, Dave Chinner wrote:
> On Wed, Apr 10, 2024 at 10:29:38PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> Implement buffered write iomap path, use ext4_da_map_blocks() to map
>> delalloc extents and add ext4_iomap_get_blocks() to allocate blocks if
>> delalloc is disabled or free space is about to run out.
>>
>> Note that we always allocate unwritten extents for new blocks in the
>> iomap write path, this means that the allocation type is no longer
>> controlled by the dioread_nolock mount option. After that, we could
>> postpone the i_disksize updating to the writeback path, and drop journal
>> handle in the buffered dealloc write path completely.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>> ---
>> fs/ext4/ext4.h | 3 +
>> fs/ext4/file.c | 19 +++++-
>> fs/ext4/inode.c | 168 ++++++++++++++++++++++++++++++++++++++++++++++--
>> 3 files changed, 183 insertions(+), 7 deletions(-)
>>
[...]
>> +#define IOMAP_F_EXT4_DELALLOC IOMAP_F_PRIVATE
>> +
>> +static int __ext4_iomap_buffered_io_begin(struct inode *inode, loff_t offset,
>> loff_t length, unsigned int iomap_flags,
>> - struct iomap *iomap, struct iomap *srcmap)
>> + struct iomap *iomap, struct iomap *srcmap,
>> + bool delalloc)
>> {
>> - int ret;
>> + int ret, retries = 0;
>> struct ext4_map_blocks map;
>> u8 blkbits = inode->i_blkbits;
>>
>> @@ -3537,20 +3580,133 @@ static int ext4_iomap_buffered_io_begin(struct inode *inode, loff_t offset,
>> return -EINVAL;
>> if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
>> return -ERANGE;
>> -
>> +retry:
>> /* Calculate the first and last logical blocks respectively. */
>> map.m_lblk = offset >> blkbits;
>> map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits,
>> EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1;
>> + if (iomap_flags & IOMAP_WRITE) {
>> + if (delalloc)
>> + ret = ext4_da_map_blocks(inode, &map);
>> + else
>> + ret = ext4_iomap_get_blocks(inode, &map);
>>
>> - ret = ext4_map_blocks(NULL, inode, &map, 0);
>> + if (ret == -ENOSPC &&
>> + ext4_should_retry_alloc(inode->i_sb, &retries))
>> + goto retry;
>> + } else {
>> + ret = ext4_map_blocks(NULL, inode, &map, 0);
>> + }
>> if (ret < 0)
>> return ret;
>>
>> ext4_set_iomap(inode, iomap, &map, offset, length, iomap_flags);
>> + if (delalloc)
>> + iomap->flags |= IOMAP_F_EXT4_DELALLOC;
>> +
>> + return 0;
>> +}
>
> Why are you implementing both read and write mapping paths in
> the one function? The whole point of having separate ops vectors for
> read and write is that it allows a clean separation of the read and
> write mapping operations. i.e. there is no need to use "if (write)
> else {do read}" code constructs at all.
>
> You can even have a different delalloc mapping function so you don't
> need "if (delalloc) else {do nonda}" branches everiywhere...
>
Because current ->iomap_begin() for ext4 buffered IO path
(i.e. __ext4_iomap_buffered_io_begin()) is simple, almost only the map
blocks handlers are different for read, da write and no da write paths,
the rest of the function parameter check and inode status check are
the same, and I noticed that the ->iomap_begin() for direct IO path
(i.e. ext4_iomap_begin()) also implemented in one function. So I'd
like to save some code now, and it looks like implement them in one
function doesn't make this function too complicated, I guess we could
split them if things change in the future.
But think about it again, split them now could make things more clear,
it's also fine to me.
Thanks,
Yi.
Powered by blists - more mailing lists