[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2af4039-65e0-6bb5-337e-fc160ae94aee@huawei.com>
Date: Thu, 30 Nov 2023 16:22:32 +0800
From: Zhang Yi <yi.zhang@...wei.com>
To: Christoph Hellwig <hch@...radead.org>, Ritesh Harjani
<ritesh.list@...il.com>
CC: Jan Kara <jack@...e.cz>, <linux-ext4@...r.kernel.org>,
<linux-fsdevel@...r.kernel.org>
Subject: Re: [RFC 2/3] ext2: Convert ext2 regular file buffered I/O to use
iomap
On 2023/11/30 12:30, Christoph Hellwig wrote:
> On Thu, Nov 30, 2023 at 08:54:31AM +0530, Ritesh Harjani wrote:
>> So I took a look at this. Here is what I think -
>> So this is useful of-course when we have a large folio. Because
>> otherwise it's just one block at a time for each folio. This is not a
>> problem once FS buffered-io handling code moves to iomap (because we
>> can then enable large folio support to it).
>
> Yes.
>
>> However, this would still require us to pass a folio to ->map_blocks
>> call to determine the size of the folio (which I am not saying can't be
>> done but just stating my observations here).
>
> XFS currently maps based on the underlyig reservation (delalloc extent)
> and not the actual map size. This works because on-disk extents are
> allocated as unwritten extents, and only the actual written part is
> the converted. But if you only want to allocate blocks for the part
IIUC, I noticed a side effect of this map method, Let's think about a
special case, if we sync a partial range of a delalloc extent, and then
the system crashes before the remaining data write back. We could get a
file which has allocated blocks(unwritten) beyond EOF. I can reproduce
it on xfs.
# write 10 blocks, but only sync 3 blocks, i_size becomes 12K after IO complete
xfs_io -f -c "pwrite 0 40k" -c "sync_range 0 12k" /mnt/foo
# postpone the remaining data writeback
echo 20000 > /proc/sys/vm/dirty_writeback_centisecs
echo 20000 > /proc/sys/vm/dirty_expire_centisecs
# wait 35s to make sure xfs's cil log writeback
sleep 35
# triger system crash
echo c > /proc/sysrq-trigger
After system reboot, the 'foo' file's size is 12K but have 10
allocated blocks.
xfs_db> inode 131
core.size = 12288
core.nblocks = 10
...
0:[0,24,3,0]
1:[3,27,7,1]
I'm not expert on xfs, but I don't think this is the right result,
is it?
Thanks,
Yi.
> actually written you actually need to pass in the dirty range and not
> just use the whole folio. This would be the incremental patch to do
> that:
>
> http://git.infradead.org/users/hch/xfs.git/commitdiff/0007893015796ef2ba16bb8b98c4c9fb6e9e6752
>
> But unless your block allocator is very cheap doing what XFS does is
> probably going to work much better.
>
>> ...ok so here is the PoC for seq counter check for ext2. Next let me
>> try to see if we can lift this up from the FS side to iomap -
>
> This looks good to me from a very superficial view. Dave is the expert
> on this, though.
>
>
> .
>
Powered by blists - more mailing lists