linux-kernel - Re: [PATCH v2 2/7] iomap: Add zero unwritten mappings dio support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1edc69f3-50d5-440a-b751-be02fd7e0834@oracle.com>
Date: Fri, 13 Dec 2024 10:43:19 +0000
From: John Garry <john.g.garry@...cle.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: brauner@...nel.org, cem@...nel.org, dchinner@...hat.com, hch@....de,
        ritesh.list@...il.com, linux-xfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        martin.petersen@...cle.com
Subject: Re: [PATCH v2 2/7] iomap: Add zero unwritten mappings dio support


>> However, I still think that we should be able to atomic write mixed extents,
>> even though it is a pain to implement. To that end, I could be convinced
>> again that we don't require it...
> 
> Well... if you /did/ add a few entries to include/uapi/linux/fs.h for
> ways that an untorn write can fail, then we could define the programming
> interface as so:
> 
> "If you receive -EBADMAP, then call fallocate(FALLOC_FL_MAKE_OVERWRITE)
> to force all the mappings to pure overwrites."
> 
> ...since there have been a few people who have asked about that ability
> so that they can write+fdatasync without so much overhead from file
> metadata updates.

ok, I see.

All this does seem more complicated in terms of implementation and user 
experience than what I have in this series. But if you think that there 
is value in FALLOC_FL_MAKE_OVERWRITE for other scenarios, then maybe it 
could be good, though.

> 
>>>
>>> Instead here we are adding a bunch of complexity, and not even all that
>>> well:
>>>
>>>> Signed-off-by: John Garry <john.g.garry@...cle.com>
>>>> ---
>>>>    fs/iomap/direct-io.c  | 76 +++++++++++++++++++++++++++++++++++++++++++
>>>>    include/linux/iomap.h |  3 ++
>>>>    2 files changed, 79 insertions(+)
>>>>
>>>> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
>>>> index 23fdad16e6a8..18c888f0c11f 100644
>>>> --- a/fs/iomap/direct-io.c
>>>> +++ b/fs/iomap/direct-io.c
>>>> @@ -805,6 +805,82 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(iomap_dio_rw);
>>>> +static loff_t
>>>> +iomap_dio_zero_unwritten_iter(struct iomap_iter *iter, struct iomap_dio *dio)
>>>> +{
>>>> +	const struct iomap *iomap = &iter->iomap;
>>>> +	loff_t length = iomap_length(iter);
>>>> +	loff_t pos = iter->pos;
>>>> +
>>>> +	if (iomap->type == IOMAP_UNWRITTEN) {
>>>> +		int ret;
>>>> +
>>>> +		dio->flags |= IOMAP_DIO_UNWRITTEN;
>>>> +		ret = iomap_dio_zero(iter, dio, pos, length);
>>>
>>> Shouldn't this be detecting the particular case that the mapping for the
>>> kiocb is in mixed state and only zeroing in that case?  This just
>>> targets every unwritten extent, even if the unwritten extent covered the
>>> entire range that is being written.
>>
>> Right, so I did touch on this in the final comment in patch 4/7 commit log.
>>
>> Why I did it this way? I did not think that it made much difference, since
>> this zeroing would be generally a one-off and did not merit even more
>> complexity to implement.
> 
> The trouble is, if you fallocate the whole file and then write an
> aligned 64k block, this will write zeroes to the block, update the
> mapping, and only then issue the untorn write.  Sure that's a one time
> performance hit, but probably not a welcome one.

ok, I can try to improve on this. It might get considerably more 
complicated...

> 
>>> It doesn't handle COW, it doesn't
>>
>> Do we want to atomic write COW?
> 
> I don't see why not -- if there's a single COW mapping for the whole
> untorn write, then the data gets written to the media in an untorn
> fashion, and the remap is a single transaction.

I tested atomic write on COW and it works ok, but the behavior is odd to me.

If I attempt to atomic write a single block in shared extent, then we 
have this callchain: xfs_file_dio_write_atomic() -> 
iomap_dio_rw(IOMAP_DIO_OVERWRITE_ONLY) -> ... 
xfs_direct_write_iomap_begin() -> xfs_reflink_allocate_cow() and we 
alloc a new extent.

And so xfs_file_dio_write_atomic() -> 
iomap_dio_rw(IOMAP_DIO_OVERWRITE_ONLY) does not return -EAGAIN and we 
don't even attempt to zero.

I just wonder why IOMAP_DIO_OVERWRITE_ONLY is not honoured here, as 
xfs_reflink_allocate_cow() -> xfs_reflink_fill_cow_hole() -> 
xfs_bmap_write() can alloc new blocks.

I am not too concerned about atomic writing mixed extents which includes 
COW extents, as atomic writing mixed extents is based on "big alloc" and 
that does not enable reflink (yet).

> 
>>> handle holes, etc.
>>
>> I did test hole, and it seemed to work. However I noticed that for a hole
>> region we get IOMAP_UNWRITTEN type, like:
> 
> Oh right, that's ->iomap_begin allocating an unwritten extent into the
> hole, because you're not allowed to specify a hole for the destination
> of a write.  I withdraw that part of the comment.


> 

Thanks,
John