[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad152fa0-0767-45cb-921e-c3e9f5eac110@oracle.com>
Date: Mon, 10 Mar 2025 12:10:44 +0000
From: John Garry <john.g.garry@...cle.com>
To: Dave Chinner <david@...morbit.com>
Cc: brauner@...nel.org, djwong@...nel.org, cem@...nel.org,
linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, ojaswin@...ux.ibm.com,
ritesh.list@...il.com, martin.petersen@...cle.com, tytso@....edu,
linux-ext4@...r.kernel.org
Subject: Re: [PATCH v4 12/12] xfs: Allow block allocator to take an alignment
hint
On 09/03/2025 22:03, Dave Chinner wrote:
> On Mon, Mar 03, 2025 at 05:11:20PM +0000, John Garry wrote:
>> When issuing an atomic write by the CoW method, give the block allocator a
>> hint to align to the extszhint.
>>
>> This means that we have a better chance to issuing the atomic write via
>> HW offload next time.
>>
>> It does mean that the inode extszhint should be set appropriately for the
>> expected atomic write size.
>>
>> Reviewed-by: "Darrick J. Wong" <djwong@...nel.org>
>> Signed-off-by: John Garry <john.g.garry@...cle.com>
>> ---
>> fs/xfs/libxfs/xfs_bmap.c | 7 ++++++-
>> fs/xfs/libxfs/xfs_bmap.h | 6 +++++-
>> fs/xfs/xfs_reflink.c | 8 ++++++--
>> 3 files changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index 0ef19f1469ec..9bfdfb7cdcae 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -3454,6 +3454,12 @@ xfs_bmap_compute_alignments(
>> align = xfs_get_cowextsz_hint(ap->ip);
>> else if (ap->datatype & XFS_ALLOC_USERDATA)
>> align = xfs_get_extsz_hint(ap->ip);
>> +
>> + if (align > 1 && ap->flags & XFS_BMAPI_EXTSZALIGN)
>
> needs () around the & logic.
ok
>
> if (align > 1 && (ap->flags & XFS_BMAPI_EXTSZALIGN))
>
>> + args->alignment = align;
>> + else
>> + args->alignment = 1;
>
> When is args->alignment not already initialised to 1?
>
>> +
>> if (align) {
>> if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
>> ap->eof, 0, ap->conv, &ap->offset,
>> @@ -3782,7 +3788,6 @@ xfs_bmap_btalloc(
>> .wasdel = ap->wasdel,
>> .resv = XFS_AG_RESV_NONE,
>> .datatype = ap->datatype,
>> - .alignment = 1,
>> .minalignslop = 0,
>> };
>
> Oh, you removed the initialisation to 1, so now we have the
> possibility of getting args->alignment = 0 anywhere in the
> allocation stack?
>
> FWIW, we've been trying to get rid of that case - args->alignment should
> always be 1 if no alignment is necessary so we don't ahve to special
> case alignment of 0 (meaning no alignemnt) anywhere. This seems
> like a step backwards from that perspective...
As I recall, doing this was a suggestion when developing the forcealign
support (as it had similar logic).
Anyway, I can leave the init to 1 in xfs_bmap_btalloc()
>
>
>
>> xfs_fileoff_t orig_offset;
>> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
>> index 4b721d935994..e6baa81e20d8 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.h
>> +++ b/fs/xfs/libxfs/xfs_bmap.h
>> @@ -87,6 +87,9 @@ struct xfs_bmalloca {
>> /* Do not update the rmap btree. Used for reconstructing bmbt from rmapbt. */
>> #define XFS_BMAPI_NORMAP (1u << 10)
>>
>> +/* Try to align allocations to the extent size hint */
>> +#define XFS_BMAPI_EXTSZALIGN (1u << 11)
>
> Don't we already do that?
>
> Or is this doing something subtle and non-obvious like overriding
> stripe width alignment for large atomic writes?
>
stripe alignment only comes into play for eof allocation.
args->alignment is used in xfs_alloc_compute_aligned() to actually align
the start bno.
If I don't have this, then we can get this ping-pong affect when
overwriting atomically the same region:
# dd if=/dev/zero of=mnt/file bs=1M count=10 conv=fsync
# xfs_bmap -vp mnt/file
mnt/file:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..20479]: 192..20671 0 (192..20671) 20480 000000
# /xfs_io -d -C "pwrite -b 64k -V 1 -A -D 0 64k" mnt/file
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0525 sec (1.190 MiB/sec and 19.0425 ops/sec)
# xfs_bmap -vp mnt/file
mnt/file:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..127]: 20672..20799 0 (20672..20799) 128 000000
1: [128..20479]: 320..20671 0 (320..20671) 20352 000000
# /xfs_io -d -C "pwrite -b 64k -V 1 -A -D 0 64k" mnt/file
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0524 sec (1.191 MiB/sec and 19.0581 ops/sec)
# xfs_bmap -vp mnt/file
mnt/file:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..20479]: 192..20671 0 (192..20671) 20480 000000
# /xfs_io -d -C "pwrite -b 64k -V 1 -A -D 0 64k" mnt/file
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0524 sec (1.191 MiB/sec and 19.0611 ops/sec)
# xfs_bmap -vp mnt/file
mnt/file:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..127]: 20672..20799 0 (20672..20799) 128 000000
1: [128..20479]: 320..20671 0 (320..20671) 20352 000000
We are never getting aligned extents wrt write length, and so have to
fall back to the SW-based atomic write always. That is not what we want.
Thanks,
John
Powered by blists - more mailing lists