[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1edc69f3-50d5-440a-b751-be02fd7e0834@oracle.com>
Date: Fri, 13 Dec 2024 10:43:19 +0000
From: John Garry <john.g.garry@...cle.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: brauner@...nel.org, cem@...nel.org, dchinner@...hat.com, hch@....de,
ritesh.list@...il.com, linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
martin.petersen@...cle.com
Subject: Re: [PATCH v2 2/7] iomap: Add zero unwritten mappings dio support
>> However, I still think that we should be able to atomic write mixed extents,
>> even though it is a pain to implement. To that end, I could be convinced
>> again that we don't require it...
>
> Well... if you /did/ add a few entries to include/uapi/linux/fs.h for
> ways that an untorn write can fail, then we could define the programming
> interface as so:
>
> "If you receive -EBADMAP, then call fallocate(FALLOC_FL_MAKE_OVERWRITE)
> to force all the mappings to pure overwrites."
>
> ...since there have been a few people who have asked about that ability
> so that they can write+fdatasync without so much overhead from file
> metadata updates.
ok, I see.
All this does seem more complicated in terms of implementation and user
experience than what I have in this series. But if you think that there
is value in FALLOC_FL_MAKE_OVERWRITE for other scenarios, then maybe it
could be good, though.
>
>>>
>>> Instead here we are adding a bunch of complexity, and not even all that
>>> well:
>>>
>>>> Signed-off-by: John Garry <john.g.garry@...cle.com>
>>>> ---
>>>> fs/iomap/direct-io.c | 76 +++++++++++++++++++++++++++++++++++++++++++
>>>> include/linux/iomap.h | 3 ++
>>>> 2 files changed, 79 insertions(+)
>>>>
>>>> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
>>>> index 23fdad16e6a8..18c888f0c11f 100644
>>>> --- a/fs/iomap/direct-io.c
>>>> +++ b/fs/iomap/direct-io.c
>>>> @@ -805,6 +805,82 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>>>> }
>>>> EXPORT_SYMBOL_GPL(iomap_dio_rw);
>>>> +static loff_t
>>>> +iomap_dio_zero_unwritten_iter(struct iomap_iter *iter, struct iomap_dio *dio)
>>>> +{
>>>> + const struct iomap *iomap = &iter->iomap;
>>>> + loff_t length = iomap_length(iter);
>>>> + loff_t pos = iter->pos;
>>>> +
>>>> + if (iomap->type == IOMAP_UNWRITTEN) {
>>>> + int ret;
>>>> +
>>>> + dio->flags |= IOMAP_DIO_UNWRITTEN;
>>>> + ret = iomap_dio_zero(iter, dio, pos, length);
>>>
>>> Shouldn't this be detecting the particular case that the mapping for the
>>> kiocb is in mixed state and only zeroing in that case? This just
>>> targets every unwritten extent, even if the unwritten extent covered the
>>> entire range that is being written.
>>
>> Right, so I did touch on this in the final comment in patch 4/7 commit log.
>>
>> Why I did it this way? I did not think that it made much difference, since
>> this zeroing would be generally a one-off and did not merit even more
>> complexity to implement.
>
> The trouble is, if you fallocate the whole file and then write an
> aligned 64k block, this will write zeroes to the block, update the
> mapping, and only then issue the untorn write. Sure that's a one time
> performance hit, but probably not a welcome one.
ok, I can try to improve on this. It might get considerably more
complicated...
>
>>> It doesn't handle COW, it doesn't
>>
>> Do we want to atomic write COW?
>
> I don't see why not -- if there's a single COW mapping for the whole
> untorn write, then the data gets written to the media in an untorn
> fashion, and the remap is a single transaction.
I tested atomic write on COW and it works ok, but the behavior is odd to me.
If I attempt to atomic write a single block in shared extent, then we
have this callchain: xfs_file_dio_write_atomic() ->
iomap_dio_rw(IOMAP_DIO_OVERWRITE_ONLY) -> ...
xfs_direct_write_iomap_begin() -> xfs_reflink_allocate_cow() and we
alloc a new extent.
And so xfs_file_dio_write_atomic() ->
iomap_dio_rw(IOMAP_DIO_OVERWRITE_ONLY) does not return -EAGAIN and we
don't even attempt to zero.
I just wonder why IOMAP_DIO_OVERWRITE_ONLY is not honoured here, as
xfs_reflink_allocate_cow() -> xfs_reflink_fill_cow_hole() ->
xfs_bmap_write() can alloc new blocks.
I am not too concerned about atomic writing mixed extents which includes
COW extents, as atomic writing mixed extents is based on "big alloc" and
that does not enable reflink (yet).
>
>>> handle holes, etc.
>>
>> I did test hole, and it seemed to work. However I noticed that for a hole
>> region we get IOMAP_UNWRITTEN type, like:
>
> Oh right, that's ->iomap_begin allocating an unwritten extent into the
> hole, because you're not allowed to specify a hole for the destination
> of a write. I withdraw that part of the comment.
>
Thanks,
John
Powered by blists - more mailing lists