[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250313171310.1886394-12-john.g.garry@oracle.com>
Date: Thu, 13 Mar 2025 17:13:08 +0000
From: John Garry <john.g.garry@...cle.com>
To: brauner@...nel.org, djwong@...nel.org, cem@...nel.org, dchinner@...hat.com,
hch@....de
Cc: linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, ojaswin@...ux.ibm.com,
ritesh.list@...il.com, martin.petersen@...cle.com, tytso@....edu,
linux-ext4@...r.kernel.org, John Garry <john.g.garry@...cle.com>
Subject: [PATCH v6 11/13] xfs: add xfs_file_dio_write_atomic()
Add xfs_file_dio_write_atomic() for dedicated handling of atomic writes.
In case of -EAGAIN being returned from iomap_dio_rw(), reissue the write
in CoW-based atomic write mode.
For CoW-based mode, ensure that we have no outstanding IOs which we
may trample on.
Reviewed-by: Darrick J. Wong <djwong@...nel.org>
Signed-off-by: John Garry <john.g.garry@...cle.com>
---
fs/xfs/xfs_file.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 7a56ddb86fd2..029684b54dda 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -725,6 +725,75 @@ xfs_file_dio_write_zoned(
return ret;
}
+/*
+ * Handle block atomic writes
+ *
+ * Two methods of atomic writes are supported:
+ * - REQ_ATOMIC-based, which would typically use some form of HW offload in the
+ * disk
+ * - COW-based, which uses a COW fork as a staging extent for data updates
+ * before atomically updating extent mappings for the range being written
+ *
+ * REQ_ATOMIC-based is the preferred method, and is attempted first. If this
+ * method fails due to REQ_ATOMIC-related constraints, then we retry with the
+ * COW-based method. The REQ_ATOMIC-based method typically will fail if the
+ * write spans multiple extents or the disk blocks are misaligned.
+ *
+ * Similar to xfs_file_dio_write_unaligned(), the retry mechanism is based on
+ * the ->iomap_begin method returning -EAGAIN, which would be when the
+ * REQ_ATOMIC-based write is not possible. In the case of IOCB_NOWAIT being set,
+ * then we will not retry with the COW-based method, and instead pass that
+ * error code back to the caller immediately.
+ *
+ * REQ_ATOMIC-based atomic writes behave such that a racing read which overlaps
+ * with range being atomically written will see all or none of the old data.
+ * Emulate this behaviour for COW-based atomic writes by using
+ * IOMAP_DIO_FORCE_WAIT and inode_dio_wait() to ensure active reads. This also
+ * locks out racing writes, which could trample on the COW fork extent.
+ */
+
+static noinline ssize_t
+xfs_file_dio_write_atomic(
+ struct xfs_inode *ip,
+ struct kiocb *iocb,
+ struct iov_iter *from)
+{
+ unsigned int iolock = XFS_IOLOCK_SHARED;
+ unsigned int dio_flags = 0;
+ const struct iomap_ops *dops = &xfs_direct_write_iomap_ops;
+ ssize_t ret;
+
+retry:
+ ret = xfs_ilock_iocb_for_write(iocb, &iolock);
+ if (ret)
+ return ret;
+
+ ret = xfs_file_write_checks(iocb, from, &iolock, NULL);
+ if (ret)
+ goto out_unlock;
+
+ if (dio_flags & IOMAP_DIO_FORCE_WAIT)
+ inode_dio_wait(VFS_I(ip));
+
+ trace_xfs_file_direct_write(iocb, from);
+ ret = iomap_dio_rw(iocb, from, dops, &xfs_dio_write_ops,
+ dio_flags, NULL, 0);
+
+ if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT) &&
+ dops == &xfs_direct_write_iomap_ops) {
+ xfs_iunlock(ip, iolock);
+ dio_flags = IOMAP_DIO_FORCE_WAIT;
+ dops = &xfs_atomic_write_cow_iomap_ops;
+ iolock = XFS_IOLOCK_EXCL;
+ goto retry;
+ }
+
+out_unlock:
+ if (iolock)
+ xfs_iunlock(ip, iolock);
+ return ret;
+}
+
/*
* Handle block unaligned direct I/O writes
*
@@ -840,6 +909,10 @@ xfs_file_dio_write(
return xfs_file_dio_write_unaligned(ip, iocb, from);
if (xfs_is_zoned_inode(ip))
return xfs_file_dio_write_zoned(ip, iocb, from);
+
+ if (iocb->ki_flags & IOCB_ATOMIC)
+ return xfs_file_dio_write_atomic(ip, iocb, from);
+
return xfs_file_dio_write_aligned(ip, iocb, from,
&xfs_direct_write_iomap_ops, &xfs_dio_write_ops, NULL);
}
--
2.31.1
Powered by blists - more mailing lists