linux-kernel - Re: [PATCH RFC 08/10] xfs: Commit CoW-based atomic writes atomically

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ce6ff351-9bcd-418d-a1b9-081b3ba05493@oracle.com>
Date: Fri, 7 Feb 2025 11:52:30 +0000
From: John Garry <john.g.garry@...cle.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: brauner@...nel.org, cem@...nel.org, dchinner@...hat.com, hch@....de,
        linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, ojaswin@...ux.ibm.com,
        ritesh.list@...il.com, martin.petersen@...cle.com
Subject: Re: [PATCH RFC 08/10] xfs: Commit CoW-based atomic writes atomically

On 06/02/2025 21:50, Darrick J. Wong wrote:
> On Thu, Feb 06, 2025 at 10:27:45AM +0000, John Garry wrote:
>> On 05/02/2025 19:47, Darrick J. Wong wrote:
>>> On Tue, Feb 04, 2025 at 12:01:25PM +0000, John Garry wrote:
>>>> When completing a CoW-based write, each extent range mapping update is
>>>> covered by a separate transaction.
>>>>
>>>> For a CoW-based atomic write, all mappings must be changed at once, so
>>>> change to use a single transaction.
>>>>
>>>> Signed-off-by: John Garry <john.g.garry@...cle.com>
>>>> ---
>>>>    fs/xfs/xfs_file.c    |  5 ++++-
>>>>    fs/xfs/xfs_reflink.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
>>>>    fs/xfs/xfs_reflink.h |  3 +++
>>>>    3 files changed, 55 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>>>> index 12af5cdc3094..170d7891f90d 100644
>>>> --- a/fs/xfs/xfs_file.c
>>>> +++ b/fs/xfs/xfs_file.c
>>>> @@ -527,7 +527,10 @@ xfs_dio_write_end_io(
>>>>    	nofs_flag = memalloc_nofs_save();
>>>>    	if (flags & IOMAP_DIO_COW) {
>>>> -		error = xfs_reflink_end_cow(ip, offset, size);
>>>> +		if (iocb->ki_flags & IOCB_ATOMIC)
>>>> +			error = xfs_reflink_end_atomic_cow(ip, offset, size);
>>>> +		else
>>>> +			error = xfs_reflink_end_cow(ip, offset, size);
>>>>    		if (error)
>>>>    			goto out;
>>>>    	}
>>>> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
>>>> index dbce333b60eb..60c986300faa 100644
>>>> --- a/fs/xfs/xfs_reflink.c
>>>> +++ b/fs/xfs/xfs_reflink.c
>>>> @@ -990,6 +990,54 @@ xfs_reflink_end_cow(
>>>>    		trace_xfs_reflink_end_cow_error(ip, error, _RET_IP_);
>>>>    	return error;
>>>>    }
>>>> +int
>>>> +xfs_reflink_end_atomic_cow(
>>>> +	struct xfs_inode		*ip,
>>>> +	xfs_off_t			offset,
>>>> +	xfs_off_t			count)
>>>> +{
>>>> +	xfs_fileoff_t			offset_fsb;
>>>> +	xfs_fileoff_t			end_fsb;
>>>> +	int				error = 0;
>>>> +	struct xfs_mount		*mp = ip->i_mount;
>>>> +	struct xfs_trans		*tp;
>>>> +	unsigned int			resblks;
>>>> +	bool				commit = false;
>>>> +
>>>> +	trace_xfs_reflink_end_cow(ip, offset, count);
>>>> +
>>>> +	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
>>>> +	end_fsb = XFS_B_TO_FSB(ip->i_mount, offset + count);
>>>> +
>>>> +	resblks = XFS_NEXTENTADD_SPACE_RES(ip->i_mount,
>>>> +				(unsigned int)(end_fsb - offset_fsb),
>>>> +				XFS_DATA_FORK);
>>>> +
>>>> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0,
>>>
>>> xfs gained reflink support for realtime volumes in 6.14-rc1, so you now
>>> have to calculate for that in here too.
>>>
>>>> +			XFS_TRANS_RESERVE, &tp);
>>>> +	if (error)
>>>> +		return error;
>>>> +
>>>> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
>>>> +	xfs_trans_ijoin(tp, ip, 0);
>>>> +
>>>> +	while (end_fsb > offset_fsb && !error)
>>>> +		error = xfs_reflink_end_cow_extent_locked(ip, &offset_fsb,
>>>> +						end_fsb, tp, &commit);
>>>
>>> Hmm.  Attaching intent items to a transaction consumes space in that
>>> transaction, so we probably ought to limit the amount that we try to do
>>> here.  Do you know what that limit is?  I don't,
>>
>> nor do I ...
>>
>>> but it's roughly
>>> tr_logres divided by the average size of a log intent item.
>>
>> So you have a ballpark figure on the average size of a log intent item, or
>> an idea on how to get it?
> 
> You could add up the size of struct
> xfs_{bui,rmap,refcount,efi}_log_format structures and add 20%, that will
> give you a ballpark figure of the worst case per-block requirements.
> 
> My guess is that 64 blocks is ok provided resblks is big enough.  But I
> guess we could estimate it (very conservatively) dynamically too.
> 
> (also note tr_itruncate declares more logres)

64 blocks gives quite a large awu max, so that would be ok

> 
>>> This means we need to restrict the size of an untorn write to a
>>> double-digit number of fsblocks for safety.
>>
>> Sure, but won't we also still be liable to suffer the same issue which was
>> fixed in commit d6f215f359637?
> 
> Yeah, come to think of it, you need to reserve the worst case space
> reservation, i.e. each of the blocks between offset_fsb and end_fsb
> becomes a separate btree update.
> 
> 	resblks = (end_fsb - offset_fsb) *
> 			XFS_NEXTENTADD_SPACE_RES(mp, 1, XFS_DATA_FORK);
> 
>

ok

Thanks,
John