[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fe6a98a8-5f53-47d4-8ae3-08e47dd383c2@oracle.com>
Date: Thu, 7 Dec 2023 12:43:12 +0000
From: John Garry <john.g.garry@...cle.com>
To: Dave Chinner <david@...morbit.com>
Cc: Ojaswin Mujoo <ojaswin@...ux.ibm.com>, linux-ext4@...r.kernel.org,
Theodore Ts'o <tytso@....edu>, Ritesh Harjani <ritesh.list@...il.com>,
linux-kernel@...r.kernel.org, "Darrick J . Wong" <djwong@...nel.org>,
linux-block@...r.kernel.org, linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, dchinner@...hat.com
Subject: Re: [RFC 1/7] iomap: Don't fall back to buffered write if the write
is atomic
On 01/12/2023 22:07, Dave Chinner wrote:
> RWF_ATOMIC is no different to RWF_NOWAIT. The API doesn't decide
> what can be supported - the filesystems themselves decide what part
> of the API they can support and implement those pieces.
>
> TO go back to RWF_NOWAIT, for a long time we (XFS) only supported
> RWF_NOWAIT on DIO, and buffered reads and writes were given
> -EOPNOTSUPP by the filesystem. Then other filesystems started
> supporting DIO with RWF_NOWAIT. Then buffered read support was added
> to the page cache and XFS, and as other filesystems were converted
> they removed the RWF_NOWAIT exclusion check from their read IO
> paths.
>
> We are now in the same place with buffered write support for
> RWF_NOWAIT. XFS, the page cache and iomap allow buffered writes w/
> RWF_NOWAIT, but ext4, btrfs and f2fs still all return -EOPNOTSUPP
> because they don't support non-blocking buffered writes yet.
>
> This is the same model we should be applying with RWF_ATOMIC - we
> know that over time we'll be able to expand support for atomic
> writes across both direct and buffered IO, so we should not be
> restricting the API or infrastructure to only allow RWF_ATOMIC w/
> DIO. Just have the filesystems reject RWF_ATOMIC w/ -EOPNOTSUPP if
> they don't support it, and for those that do it is conditional on
> whther the filesystem supports it for the given type of IO being
> done.
>
> Seriously - an application can easily probe for RWF_ATOMIC support
> without needing information to be directly exposed in statx() - just
> open a O_TMPFILE, issue the type of RWF_ATOMIC IO you require to be
> supported, and if it returns -EOPNOTSUPP then it you can't use
> RWF_ATOMIC optimisations in the application....
Hi Dave,
For rejecting RWF_ATOMIC when not supported for a file, how about
something like this:
--->8----
diff --git a/block/fops.c b/block/fops.c
index 273bd8f5a370..d9563ef29dde 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -639,6 +637,9 @@ static int blkdev_open(struct inode *inode, struct
file *filp)
if (IS_ERR(handle))
return PTR_ERR(handle);
+ if (queue_atomic_write_unit_max_bytes(bdev_get_queue(handle->bdev)))
+ filp->f_mode |= FMODE_CAN_ATOMIC_WRITE;
+
if (bdev_nowait(handle->bdev))
filp->f_mode |= FMODE_NOWAIT;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4256ec184461..d725c194243c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -185,6 +185,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb,
loff_t offset,
/* File supports async nowait buffered writes */
#define FMODE_BUF_WASYNC ((__force fmode_t)0x80000000)
+/* File supports atomic writes */
+#define FMODE_CAN_ATOMIC_WRITE ((__force fmode_t)0x100000000)
+
/*
* Attribute flags. These should be or-ed together to figure out what
* has been changed!
@@ -3266,6 +3269,10 @@ static inline int kiocb_set_rw_flags(struct kiocb
*ki, rwf_t flags)
return -EOPNOTSUPP;
kiocb_flags |= IOCB_NOIO;
}
+ if (flags & RWF_ATOMIC) {
+ if (!(ki->ki_filp->f_mode & FMODE_CAN_ATOMIC_WRITE))
+ return -EOPNOTSUPP;
+ }
kiocb_flags |= (__force int) (flags & RWF_SUPPORTED);
if (flags & RWF_SYNC)
kiocb_flags |= IOCB_DSYNC;
diff --git a/include/linux/types.h b/include/linux/types.h
index 253168bb3fe1..49c754fde1d6 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -153,7 +153,7 @@ typedef u32 dma_addr_t;
typedef unsigned int __bitwise gfp_t;
typedef unsigned int __bitwise slab_flags_t;
-typedef unsigned int __bitwise fmode_t;
+typedef unsigned long __bitwise fmode_t;
#ifdef CONFIG_PHYS_ADDR_T_64BIT
typedef u64 phys_addr_t;
----8<------
My concern is that we need to increase fmode_t in size as all available
32 bits are used up.
Thanks,
John
Powered by blists - more mailing lists