[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <874jhfy7i8.fsf@doe.com>
Date: Tue, 21 Nov 2023 11:26:15 +0530
From: Ritesh Harjani (IBM) <ritesh.list@...il.com>
To: Christoph Hellwig <hch@...radead.org>
Cc: linux-ext4@...r.kernel.org, Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org
Subject: Re: [RFC 2/3] ext2: Convert ext2 regular file buffered I/O to use iomap
Christoph Hellwig <hch@...radead.org> writes:
> On Tue, Nov 21, 2023 at 12:35:20AM +0530, Ritesh Harjani (IBM) wrote:
>> - mmap path of ext2 continues to use generic_file_vm_ops
>
> So this means there are not space reservations taken at write fault
> time.
Yes.
> While iomap was written with the assumption those exist, I can't
> instantly spot anything that relies on them - you are just a lot more
> likely to hit an -ENOSPC from ->map_blocks now.
Which is also true with existing code no? If the block reservation is
not done at the write fault, writeback is likely to fail due to ENOSPC?
> Maybe we should add
> an xfstests that covers the case where we use up more then the existing
> free space through writable mmaps to ensure it fully works (where works
> means at least as good as the old behavior)?
>
Sure, I can try write an fstests for the same.
Also I did find an old thread which tried implementing ->page_mkwrite in
ext2 [1]. However, it was rejected with following reason -
"""
Allocating
blocks on write fault has the unwelcome impact that the file layout is
likely going to be much worse (much more fragmented) - I remember getting
some reports about performance regressions from users back when I did a
similar change for ext3. And so I'm reluctant to change behavior of such
an old legacy filesystem as ext2...
"""
https://lore.kernel.org/linux-ext4/20210105175348.GE15080@quack2.suse.cz/
>> +static ssize_t ext2_buffered_write_iter(struct kiocb *iocb,
>> + struct iov_iter *from)
>> +{
>> + ssize_t ret = 0;
>> + struct inode *inode = file_inode(iocb->ki_filp);
>> +
>> + inode_lock(inode);
>> + ret = generic_write_checks(iocb, from);
>> + if (ret > 0)
>> + ret = iomap_file_buffered_write(iocb, from, &ext2_iomap_ops);
>> + inode_unlock(inode);
>> + if (ret > 0)
>> + ret = generic_write_sync(iocb, ret);
>> + return ret;
>> +}
>
> Not for now, but if we end up doing a bunch of conversation of trivial
> file systems, it might be worth to eventually add a wrapper for the
> above code with just the iomap_ops passed in. Only once we see a few
> duplicates, though.
>
Sure, make sense. Thanks!
I can try and check if the the wrapper helps.
>> +static int ext2_write_map_blocks(struct iomap_writepage_ctx *wpc,
>> + struct inode *inode, loff_t offset)
>> +{
>> + if (offset >= wpc->iomap.offset &&
>> + offset < wpc->iomap.offset + wpc->iomap.length)
>> + return 0;
>> +
>> + return ext2_iomap_begin(inode, offset, inode->i_sb->s_blocksize,
>> + IOMAP_WRITE, &wpc->iomap, NULL);
>> +}
>
> Looking at gfs2 this also might become a pattern. But I'm fine with
> waiting for now.
>
ok.
>> @@ -1372,7 +1428,7 @@ void ext2_set_file_ops(struct inode *inode)
>> if (IS_DAX(inode))
>> inode->i_mapping->a_ops = &ext2_dax_aops;
>> else
>> - inode->i_mapping->a_ops = &ext2_aops;
>> + inode->i_mapping->a_ops = &ext2_file_aops;
>> }
>
> Did yo run into issues in using the iomap based aops for the other uses
> of ext2_aops, or are just trying to address the users one at a time?
There are problems for e.g. for dir type in ext2. It uses the pagecache
for dir. It uses buffer_heads and attaches them to folio->private.
...it uses block_write_begin/block_write_end() calls.
Look for ext4_make_empty() -> ext4_prepare_chunk ->
block_write_begin().
Now during sync/writeback of the dirty pages (ext4_handle_dirsync()), we
might take a iomap writeback path (if using ext2_file_aops for dir)
which sees folio->private assuming it is "struct iomap_folio_state".
And bad things will happen...
Now we don't have an equivalent APIs in iomap for
block_write_begin()/end() which the users can call for. Hence, Jan
suggested to lets first convert ext2 regular file path to iomap as an RFC.
I shall now check for the dir type to see what changes we might require
in ext2/iomap code.
Thanks again for your review!
-ritesh
Powered by blists - more mailing lists