linux-kernel - Re: [PATCH 2/4] readv.2: Document RWF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7da93082-2985-85f4-7688-a082728de0a5@oracle.com>
Date:   Tue, 24 Oct 2023 13:35:33 +0100
From:   John Garry <john.g.garry@...cle.com>
To:     "Darrick J. Wong" <djwong@...nel.org>,
        Dave Chinner <david@...morbit.com>
Cc:     linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
        martin.petersen@...cle.com, himanshu.madhani@...cle.com
Subject: Re: [PATCH 2/4] readv.2: Document RWF_ATOMIC flag

On 09/10/2023 22:05, Darrick J. Wong wrote:
>>> If the file range is a sparse hole, the directio setup will allocate
>>> space and create an unwritten mapping before issuing the write bio.  The
>>> rest of the process works the same as preallocations and has the same
>>> behaviors.
>>>
>>> If the file range is allocated and was previously written, the write is
>>> issued and that's all that's needed from the fs.  After a crash, reads
>>> of the storage device produce the old contents or the new contents.
>> This is exactly what I explained when reviewing the code that
>> rejected RWF_ATOMIC without O_DSYNC on metadata dirty inodes.
> I'm glad we agree. 😄
> 
> John, when you're back from vacation, can we get rid of this language
> and all those checks under _is_dsync() in the iomap patch?
> 
> (That code is 100% the result of me handwaving and bellyaching 6 months
> ago when the team was trying to get all the atomic writes bits working
> prior to LSF and I was too burned out to think the xfs part through.
> As a result, I decided that we'd only support strict overwrites for the
> first iteration.)

So this following additive code in iomap_dio_bio_iter() should be dropped:

----8<-----

--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -275,10 +275,11 @@ static inline blk_opf_t 
iomap_dio_bio_opflags(struct iomap_dio *dio,
  static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
  		struct iomap_dio *dio)
  {

...

@@ -292,6 +293,13 @@ static loff_t iomap_dio_bio_iter(const struct 
iomap_iter *iter,
  	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
  		return -EINVAL;

+	if (atomic_write && !iocb_is_dsync(dio->iocb)) {
+		if (iomap->flags & IOMAP_F_DIRTY)
+			return -EIO;
+		if (iomap->type != IOMAP_MAPPED)
+			return -EIO;
+	}
+

---->8-----

ok?

> 
>>> Summarizing:
>>>
>>> An (ATOMIC|SYNC) request provides the strongest guarantees (data
>>> will not be torn, and all file metadata updates are persisted before
>>> the write is returned to userspace.  Programs see either the old data or
>>> the new data, even if there's a crash.
>>>
>>> (ATOMIC|DSYNC) is less strong -- data will not be torn, and any file
>>> updates for just that region are persisted before the write is returned.
>>>
>>> (ATOMIC) is the least strong -- data will not be torn.  Neither the
>>> filesystem nor the device make guarantees that anything ended up on
>>> stable storage, but if it does, programs see either the old data or the
>>> new data.
>> Yup, that makes sense to me.
> Perhaps this ^^ is what we should be documenting here.
> 
>>> Maybe we should rename the whole UAPI s/atomic/untorn/...
>> Perhaps, though "torn writes" is nomenclature that nobody outside
>> storage and filesystem developers really knows about. All I ever
>> hear from userspace developers is "we want atomic/all-or-nothing
>> data writes"...
> Fair 'enuf.


Thanks,
John