[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7da93082-2985-85f4-7688-a082728de0a5@oracle.com>
Date: Tue, 24 Oct 2023 13:35:33 +0100
From: John Garry <john.g.garry@...cle.com>
To: "Darrick J. Wong" <djwong@...nel.org>,
Dave Chinner <david@...morbit.com>
Cc: linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
martin.petersen@...cle.com, himanshu.madhani@...cle.com
Subject: Re: [PATCH 2/4] readv.2: Document RWF_ATOMIC flag
On 09/10/2023 22:05, Darrick J. Wong wrote:
>>> If the file range is a sparse hole, the directio setup will allocate
>>> space and create an unwritten mapping before issuing the write bio. The
>>> rest of the process works the same as preallocations and has the same
>>> behaviors.
>>>
>>> If the file range is allocated and was previously written, the write is
>>> issued and that's all that's needed from the fs. After a crash, reads
>>> of the storage device produce the old contents or the new contents.
>> This is exactly what I explained when reviewing the code that
>> rejected RWF_ATOMIC without O_DSYNC on metadata dirty inodes.
> I'm glad we agree. 😄
>
> John, when you're back from vacation, can we get rid of this language
> and all those checks under _is_dsync() in the iomap patch?
>
> (That code is 100% the result of me handwaving and bellyaching 6 months
> ago when the team was trying to get all the atomic writes bits working
> prior to LSF and I was too burned out to think the xfs part through.
> As a result, I decided that we'd only support strict overwrites for the
> first iteration.)
So this following additive code in iomap_dio_bio_iter() should be dropped:
----8<-----
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -275,10 +275,11 @@ static inline blk_opf_t
iomap_dio_bio_opflags(struct iomap_dio *dio,
static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
struct iomap_dio *dio)
{
...
@@ -292,6 +293,13 @@ static loff_t iomap_dio_bio_iter(const struct
iomap_iter *iter,
!bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
return -EINVAL;
+ if (atomic_write && !iocb_is_dsync(dio->iocb)) {
+ if (iomap->flags & IOMAP_F_DIRTY)
+ return -EIO;
+ if (iomap->type != IOMAP_MAPPED)
+ return -EIO;
+ }
+
---->8-----
ok?
>
>>> Summarizing:
>>>
>>> An (ATOMIC|SYNC) request provides the strongest guarantees (data
>>> will not be torn, and all file metadata updates are persisted before
>>> the write is returned to userspace. Programs see either the old data or
>>> the new data, even if there's a crash.
>>>
>>> (ATOMIC|DSYNC) is less strong -- data will not be torn, and any file
>>> updates for just that region are persisted before the write is returned.
>>>
>>> (ATOMIC) is the least strong -- data will not be torn. Neither the
>>> filesystem nor the device make guarantees that anything ended up on
>>> stable storage, but if it does, programs see either the old data or the
>>> new data.
>> Yup, that makes sense to me.
> Perhaps this ^^ is what we should be documenting here.
>
>>> Maybe we should rename the whole UAPI s/atomic/untorn/...
>> Perhaps, though "torn writes" is nomenclature that nobody outside
>> storage and filesystem developers really knows about. All I ever
>> hear from userspace developers is "we want atomic/all-or-nothing
>> data writes"...
> Fair 'enuf.
Thanks,
John
Powered by blists - more mailing lists