linux-kernel - Re: [PATCH v6 10/13] xfs: iomap COW-based atomic write support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <de3f6e25-851a-4ed7-9511-397270785794@oracle.com>
Date: Tue, 18 Mar 2025 17:44:46 +0000
From: John Garry <john.g.garry@...cle.com>
To: Christoph Hellwig <hch@....de>
Cc: brauner@...nel.org, djwong@...nel.org, cem@...nel.org, dchinner@...hat.com,
        linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, ojaswin@...ux.ibm.com,
        ritesh.list@...il.com, martin.petersen@...cle.com, tytso@....edu,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH v6 10/13] xfs: iomap COW-based atomic write support

On 18/03/2025 08:32, Christoph Hellwig wrote:
> On Tue, Mar 18, 2025 at 08:22:53AM +0000, John Garry wrote:
>>> Is xfs_reflink_allocate_cow even the right helper to use?  We know we
>>> absolutely want a a COW fork extent, we know there can't be delalloc
>>> extent to convert as we flushed dirty data, so most of the logic in it
>>> is pointless.
>>
>> Well xfs_reflink_allocate_cow gives us what we want when we set some flag
>> (XFS_REFLINK_FORCE_COW).
>>
>> Are you hinting at a dedicated helper? Note that
>> xfs_reflink_fill_cow_hole() also handles the XFS_REFLINK_FORCE_COW flag.
> 
> We might not even need much of a helper.  This all comes from my
> experience with the zoned code that always writes out of place as well.
> I initially tried to reuse the existing iomap_begin methods and all
> these helpers, but in the end it turned out to much, much simpler to
> just open code the logic.  Now the atomic cow code will be a little
> more complex in some aspect (unwritten extents, speculative
> preallocations), but also simpler in others (no need to support buffered
> I/O including zeroing and sub-block writes that require the ugly
> srcmap based read-modify-write), but I suspect the overall trade-off
> will be similar.
> 
> After all the high-level logic for the atomic COW iomap_begin really
> should just be:
> 
>   - take the ilock
>   - check the COW fork if there is an extent for the start of the range.
>   - If yes:
>       - if the extent is unwritten, convert the part overlapping with
>         the range to written

I was wondering when it could be written, and then I figured it could be 
if we had this sort of scenario:
- initially we have a mixed of shared and unshared file, like:

mnt/file:
EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
   0: [0..7]:          192..199          0 (192..199)           8 100000
   1: [8..15]:         32840..32847      0 (32840..32847)       8 000000
   2: [16..20479]:     208..20671        0 (208..20671)     20464 100000
FLAG Values:
    0100000 Shared extent
    0010000 Unwritten preallocated extent
    0001000 Doesn't begin on stripe unit
    0000100 Doesn't end   on stripe unit
    0000010 Doesn't begin on stripe width
    0000001 Doesn't end   on stripe width

- try atomic write of size 32K @ offset 0
  	- we first try HW atomics method and call 
xfs_direct_write_iomap_begin() -> xfs_reflink_allocate_cow()
	- CoW fork mapping is no good for atomics (too short), so try CoW 
atomic method
- in CoW atomic method, we find that pre-alloc'ed CoW fork extent (which 
was converted to written already)

>       - return the extent
>   - If no:
>       - allocate a new extent covering the range in the COW fork
> 
> Doing this without going down multiple levels of helpers designed for
> an entirely different use case will probably simplify things
> significantly.

Please suggest any further modifications to the following attempt. I 
have XFS_REFLINK_FORCE_COW still being passed to 
xfs_reflink_fill_cow_hole(), but xfs_reflink_fill_cow_hole() is quite a 
large function and I am not sure if I want to duplicate lots of it.

static int
xfs_atomic_write_cow_iomap_begin(
	struct inode		*inode,
	loff_t			offset,
	loff_t			length,
	unsigned		flags,
	struct iomap		*iomap,
	struct iomap		*srcmap)
{
	ASSERT(flags & IOMAP_WRITE);
	ASSERT(flags & IOMAP_DIRECT);

	struct xfs_inode	*ip = XFS_I(inode);
	struct xfs_mount	*mp = ip->i_mount;
	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
	xfs_fileoff_t		count_fsb = end_fsb - offset_fsb;
	struct xfs_bmbt_irec	imap = {
		.br_startoff = offset_fsb,
		.br_startblock = HOLESTARTBLOCK,
		.br_blockcount = end_fsb - offset_fsb,
		.br_state = XFS_EXT_UNWRITTEN,
	};
	struct xfs_bmbt_irec	cmap;
	bool			shared = false;
	unsigned int		lockmode = XFS_ILOCK_EXCL;
	u64			seq;
	struct xfs_iext_cursor	icur;

	if (xfs_is_shutdown(mp))
		return -EIO;

	if (!xfs_has_reflink(mp))
		return -EINVAL;

	if (!ip->i_cowfp) {
		ASSERT(!xfs_is_reflink_inode(ip));
		xfs_ifork_init_cow(ip);
	}

	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
	if (error)
		return error;

	if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap) 
&& cmap.br_startoff <= offset_fsb) {
		if (cmap.br_state == XFS_EXT_UNWRITTEN) {
			xfs_trim_extent(&cmap, offset_fsb, count_fsb);
			
			error = xfs_reflink_convert_unwritten(ip, &imap, &cmap, 
XFS_REFLINK_CONVERT_UNWRITTEN);
			if (error)
				goto out_unlock;
		}
	} else {
		error = xfs_reflink_fill_cow_hole(ip, &imap, &cmap, &shared,
				&lockmode, XFS_REFLINK_CONVERT_UNWRITTEN |
				XFS_REFLINK_FORCE_COW | XFS_REFLINK_ALLOC_EXTSZALIGN);
		if (error)
			goto out_unlock;
	}

finish:
	... // as before
}

xfs_reflink_convert_unwritten() and others are private to reflink.c, so 
this function should also prob live there.

thanks