linux-kernel - Re: [PATCH v6 11/13] xfs: add xfs_file_dio_write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7d9585df-9a1c-42f7-99ca-084dd47ea3ae@oracle.com>
Date: Mon, 17 Mar 2025 09:36:13 +0000
From: John Garry <john.g.garry@...cle.com>
To: Christoph Hellwig <hch@....de>
Cc: brauner@...nel.org, djwong@...nel.org, cem@...nel.org, dchinner@...hat.com,
        linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, ojaswin@...ux.ibm.com,
        ritesh.list@...il.com, martin.petersen@...cle.com, tytso@....edu,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH v6 11/13] xfs: add xfs_file_dio_write_atomic()

On 17/03/2025 06:41, Christoph Hellwig wrote:
> On Thu, Mar 13, 2025 at 05:13:08PM +0000, John Garry wrote:
>> + * REQ_ATOMIC-based is the preferred method, and is attempted first. If this
>> + * method fails due to REQ_ATOMIC-related constraints, then we retry with the
>> + * COW-based method. The REQ_ATOMIC-based method typically will fail if the
>> + * write spans multiple extents or the disk blocks are misaligned.
> 
> It is only preferred if actually supported by the underlying hardware.
> If it isn't it really shouldn't even be tried, as that is just a waste
> of cycles.

We should not even call this function if atomics are not supported by HW 
- please see IOCB_ATOMIC checks in xfs_file_write_iter(). So maybe I 
will mention that the caller must ensure atomics are supported for the 
write size.

> 
> Also a lot of comment should probably be near the code not on top
> of the function as that's where people would look for them.

sure, if you prefer

> 
>> +static noinline ssize_t
>> +xfs_file_dio_write_atomic(
>> +	struct xfs_inode	*ip,
>> +	struct kiocb		*iocb,
>> +	struct iov_iter		*from)
>> +{
>> +	unsigned int		iolock = XFS_IOLOCK_SHARED;
>> +	unsigned int		dio_flags = 0;
>> +	const struct iomap_ops	*dops = &xfs_direct_write_iomap_ops;
>> +	ssize_t			ret;
>> +
>> +retry:
>> +	ret = xfs_ilock_iocb_for_write(iocb, &iolock);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = xfs_file_write_checks(iocb, from, &iolock, NULL);
>> +	if (ret)
>> +		goto out_unlock;
>> +
>> +	if (dio_flags & IOMAP_DIO_FORCE_WAIT)
>> +		inode_dio_wait(VFS_I(ip));
>> +
>> +	trace_xfs_file_direct_write(iocb, from);
>> +	ret = iomap_dio_rw(iocb, from, dops, &xfs_dio_write_ops,
>> +			dio_flags, NULL, 0);
> 
> The normal direct I/O path downgrades the iolock to shared before
> doing the I/O here.  Why isn't that done here?

OK, I can do that. But we still require exclusive lock always for the 
CoW-based method.

> 
>> +	if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT) &&
>> +	    dops == &xfs_direct_write_iomap_ops) {
> 
> This should probably explain the unusual use of EGAIN.  Although I
> still feel that picking a different error code for the fallback would
> be much more maintainable.

I could try another error code - can you suggest one? Is it going to be 
something unrelated to storage stack, like EREMOTEIO?

> 
>> +		xfs_iunlock(ip, iolock);
>> +		dio_flags = IOMAP_DIO_FORCE_WAIT;
> 
> I notice the top of function comment mentions the IOMAP_DIO_FORCE_WAIT
> flag.  Maybe use the chance to write a full sentence here or where
> it is checked to explain the logic a bit better?

ok, fine

> 
>>    * Handle block unaligned direct I/O writes
>>    *
>> @@ -840,6 +909,10 @@ xfs_file_dio_write(
>>   		return xfs_file_dio_write_unaligned(ip, iocb, from);
>>   	if (xfs_is_zoned_inode(ip))
>>   		return xfs_file_dio_write_zoned(ip, iocb, from);
>> +
>> +	if (iocb->ki_flags & IOCB_ATOMIC)
>> +		return xfs_file_dio_write_atomic(ip, iocb, from);
>> +
> 
> Either keep space between all the conditional calls or none.  I doubt
> just stick to the existing style.

Sure

>