[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <de3f6e25-851a-4ed7-9511-397270785794@oracle.com>
Date: Tue, 18 Mar 2025 17:44:46 +0000
From: John Garry <john.g.garry@...cle.com>
To: Christoph Hellwig <hch@....de>
Cc: brauner@...nel.org, djwong@...nel.org, cem@...nel.org, dchinner@...hat.com,
linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, ojaswin@...ux.ibm.com,
ritesh.list@...il.com, martin.petersen@...cle.com, tytso@....edu,
linux-ext4@...r.kernel.org
Subject: Re: [PATCH v6 10/13] xfs: iomap COW-based atomic write support
On 18/03/2025 08:32, Christoph Hellwig wrote:
> On Tue, Mar 18, 2025 at 08:22:53AM +0000, John Garry wrote:
>>> Is xfs_reflink_allocate_cow even the right helper to use? We know we
>>> absolutely want a a COW fork extent, we know there can't be delalloc
>>> extent to convert as we flushed dirty data, so most of the logic in it
>>> is pointless.
>>
>> Well xfs_reflink_allocate_cow gives us what we want when we set some flag
>> (XFS_REFLINK_FORCE_COW).
>>
>> Are you hinting at a dedicated helper? Note that
>> xfs_reflink_fill_cow_hole() also handles the XFS_REFLINK_FORCE_COW flag.
>
> We might not even need much of a helper. This all comes from my
> experience with the zoned code that always writes out of place as well.
> I initially tried to reuse the existing iomap_begin methods and all
> these helpers, but in the end it turned out to much, much simpler to
> just open code the logic. Now the atomic cow code will be a little
> more complex in some aspect (unwritten extents, speculative
> preallocations), but also simpler in others (no need to support buffered
> I/O including zeroing and sub-block writes that require the ugly
> srcmap based read-modify-write), but I suspect the overall trade-off
> will be similar.
>
> After all the high-level logic for the atomic COW iomap_begin really
> should just be:
>
> - take the ilock
> - check the COW fork if there is an extent for the start of the range.
> - If yes:
> - if the extent is unwritten, convert the part overlapping with
> the range to written
I was wondering when it could be written, and then I figured it could be
if we had this sort of scenario:
- initially we have a mixed of shared and unshared file, like:
mnt/file:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 192..199 0 (192..199) 8 100000
1: [8..15]: 32840..32847 0 (32840..32847) 8 000000
2: [16..20479]: 208..20671 0 (208..20671) 20464 100000
FLAG Values:
0100000 Shared extent
0010000 Unwritten preallocated extent
0001000 Doesn't begin on stripe unit
0000100 Doesn't end on stripe unit
0000010 Doesn't begin on stripe width
0000001 Doesn't end on stripe width
- try atomic write of size 32K @ offset 0
- we first try HW atomics method and call
xfs_direct_write_iomap_begin() -> xfs_reflink_allocate_cow()
- CoW fork mapping is no good for atomics (too short), so try CoW
atomic method
- in CoW atomic method, we find that pre-alloc'ed CoW fork extent (which
was converted to written already)
> - return the extent
> - If no:
> - allocate a new extent covering the range in the COW fork
>
> Doing this without going down multiple levels of helpers designed for
> an entirely different use case will probably simplify things
> significantly.
Please suggest any further modifications to the following attempt. I
have XFS_REFLINK_FORCE_COW still being passed to
xfs_reflink_fill_cow_hole(), but xfs_reflink_fill_cow_hole() is quite a
large function and I am not sure if I want to duplicate lots of it.
static int
xfs_atomic_write_cow_iomap_begin(
struct inode *inode,
loff_t offset,
loff_t length,
unsigned flags,
struct iomap *iomap,
struct iomap *srcmap)
{
ASSERT(flags & IOMAP_WRITE);
ASSERT(flags & IOMAP_DIRECT);
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length);
xfs_fileoff_t count_fsb = end_fsb - offset_fsb;
struct xfs_bmbt_irec imap = {
.br_startoff = offset_fsb,
.br_startblock = HOLESTARTBLOCK,
.br_blockcount = end_fsb - offset_fsb,
.br_state = XFS_EXT_UNWRITTEN,
};
struct xfs_bmbt_irec cmap;
bool shared = false;
unsigned int lockmode = XFS_ILOCK_EXCL;
u64 seq;
struct xfs_iext_cursor icur;
if (xfs_is_shutdown(mp))
return -EIO;
if (!xfs_has_reflink(mp))
return -EINVAL;
if (!ip->i_cowfp) {
ASSERT(!xfs_is_reflink_inode(ip));
xfs_ifork_init_cow(ip);
}
error = xfs_ilock_for_iomap(ip, flags, &lockmode);
if (error)
return error;
if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap)
&& cmap.br_startoff <= offset_fsb) {
if (cmap.br_state == XFS_EXT_UNWRITTEN) {
xfs_trim_extent(&cmap, offset_fsb, count_fsb);
error = xfs_reflink_convert_unwritten(ip, &imap, &cmap,
XFS_REFLINK_CONVERT_UNWRITTEN);
if (error)
goto out_unlock;
}
} else {
error = xfs_reflink_fill_cow_hole(ip, &imap, &cmap, &shared,
&lockmode, XFS_REFLINK_CONVERT_UNWRITTEN |
XFS_REFLINK_FORCE_COW | XFS_REFLINK_ALLOC_EXTSZALIGN);
if (error)
goto out_unlock;
}
finish:
... // as before
}
xfs_reflink_convert_unwritten() and others are private to reflink.c, so
this function should also prob live there.
thanks
Powered by blists - more mailing lists