[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1394ceeb-ce8c-4d0f-aec8-ba93bf1afb90@oracle.com>
Date: Wed, 18 Sep 2024 11:12:47 +0100
From: John Garry <john.g.garry@...cle.com>
To: Dave Chinner <david@...morbit.com>
Cc: Ritesh Harjani <ritesh.list@...il.com>, chandan.babu@...cle.com,
djwong@...nel.org, dchinner@...hat.com, hch@....de,
viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz,
linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, catherine.hoang@...cle.com,
martin.petersen@...cle.com
Subject: Re: [PATCH v4 00/14] forcealign for xfs
On 17/09/2024 23:27, Dave Chinner wrote:
>> # xfs_bmap -vvp mnt/file
>> mnt/file:
>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
>> 0: [0..15]: 384..399 0 (384..399) 16 010000
>> 1: [16..31]: 400..415 0 (400..415) 16 000000
>> 2: [32..127]: 416..511 0 (416..511) 96 010000
>> 3: [128..255]: 256..383 0 (256..383) 128 000000
>> FLAG Values:
>> 0010000 Unwritten preallocated extent
>>
>> Here we have unaligned extents wrt extsize.
>>
>> The sub-alloc unit zeroing would solve that - is that what you would still
>> advocate (to solve that issue)?
> Yes, I thought that was already implemented for force-align with the
> DIO code via the extsize zero-around changes in the iomap code. Why
> isn't that zero-around code ensuring the correct extent layout here?
I just have not included the extsize zero-around changes here. They were
just grouped with the atomic writes support, as they were added
specifically for the atomic writes support. Indeed - to me at least - it
is strange that the DIO code changes are required for XFS forcealign
implementation. And, even if we use extsize zero-around changes for DIO
path, what about buffered IO?
BTW, I still have concern with this extsize zero-around change which I
was making:
xfs_iomap_write_unwritten()
{
unsigned int rounding;
/* when converting anything unwritten, we must be spanning an alloc
unit, so round up/down */
if (rounding > 1) {
offset_fsb = rounddown(rounding);
count_fsb = roundup(rounding);
}
...
do {
xfs_bmapi_write();
...
xfs_trans_commit();
} while ();
}
As mentioned elsewhere, it's a bit of a bodge (to do this rounding).
>
>>> FWIW, I also understand things are different if we are doing 128kB
>>> atomic writes on 16kB force aligned files. However, in this
>>> situation we are treating the 128kB atomic IO as eight individual
>>> 16kB atomic IOs that are physically contiguous.
>> Yes, if 16kB force aligned, userspace can only issue 16KB atomic writes.
> Right, but the eventual goal (given the statx parameters) is to be
> able to do 8x16kB sequential atomic writes as a single 128kB IO, yes?
No, if atomic write unit max is 16KB, then userspace can only issue a
single 16KB atomic write.
However, some things to consider:
a. the block layer may merge those 16KB atomic writes
b. userspace may also merge 16KB atomic writes and issue a larger atomic
write (if atomic write unit max is > 16KB)
I had been wondering if there is any value in a lib for helping with b.
>
>>>>> Again, this is different to the traditional RT file behaviour - it
>>>>> can use unwritten extents for sub-alloc-unit alignment unmaps
>>>>> because the RT device can align file offset to any physical offset,
>>>>> and issue unaligned sector sized IO without any restrictions. Forced
>>>>> alignment does not have this freedom, and when we extend forced
>>>>> alignment to RT files, it will not have the freedom to use
>>>>> unwritten extents for sub-alloc-unit unmapping, either.
>>>>>
>>>> So how do you think that we should actually implement
>>>> xfs_itruncate_extents_flags() properly for forcealign? Would it simply be
>>>> like:
>>>>
>>>> --- a/fs/xfs/xfs_inode.c
>>>> +++ b/fs/xfs/xfs_inode.c
>>>> @@ -1050,7 +1050,7 @@ xfs_itruncate_extents_flags(
>>>> WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF);
>>>> return 0;
>>>> }
>>>> + if (xfs_inode_has_forcealign(ip))
>>>> + first_unmap_block = xfs_inode_roundup_alloc_unit(ip,
>>>> first_unmap_block);
>>>> error = xfs_bunmapi_range(&tp, ip, flags, first_unmap_block,
>>> Yes, it would be something like that, except it would have to be
>>> done before first_unmap_block is verified.
>>>
>> ok, and are you still of the opinion that this does not apply to rtvol?
> The rtvol is*not* force-aligned. It -may- have some aligned
> allocation requirements that are similar (i.e. sb_rextsize > 1 fsb)
> but it does*not* force-align extents, written or unwritten.
>
> The moment we add force-align support to RT files (as is the plan),
> then the force-aligned inodes on the rtvol will need to behave as
> force aligned inodes, not "rtvol" inodes.
ok, fine
Thanks,
John
Powered by blists - more mailing lists