[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <58ebc5a8-941b-4c3d-a3b2-3985d7eeea30@kernel.dk>
Date: Tue, 12 Nov 2024 11:47:57 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Brian Foster <bfoster@...hat.com>
Cc: linux-mm@...ck.org, linux-fsdevel@...r.kernel.org, hannes@...xchg.org,
clm@...a.com, linux-kernel@...r.kernel.org, willy@...radead.org,
kirill@...temov.name, linux-btrfs@...r.kernel.org,
linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [PATCH 12/16] ext4: add RWF_UNCACHED write support
On 11/12/24 11:11 AM, Brian Foster wrote:
> On Tue, Nov 12, 2024 at 10:13:12AM -0700, Jens Axboe wrote:
>> On 11/12/24 9:36 AM, Brian Foster wrote:
>>> On Mon, Nov 11, 2024 at 04:37:39PM -0700, Jens Axboe wrote:
>>>> IOCB_UNCACHED IO needs to prune writeback regions on IO completion,
>>>> and hence need the worker punt that ext4 also does for unwritten
>>>> extents. Add an io_end flag to manage that.
>>>>
>>>> If foliop is set to foliop_uncached in ext4_write_begin(), then set
>>>> FGP_UNCACHED so that __filemap_get_folio() will mark newly created
>>>> folios as uncached. That in turn will make writeback completion drop
>>>> these ranges from the page cache.
>>>>
>>>> Now that ext4 supports both uncached reads and writes, add the fop_flag
>>>> FOP_UNCACHED to enable it.
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@...nel.dk>
>>>> ---
>>>> fs/ext4/ext4.h | 1 +
>>>> fs/ext4/file.c | 2 +-
>>>> fs/ext4/inline.c | 7 ++++++-
>>>> fs/ext4/inode.c | 18 ++++++++++++++++--
>>>> fs/ext4/page-io.c | 28 ++++++++++++++++------------
>>>> 5 files changed, 40 insertions(+), 16 deletions(-)
>>>>
>>> ...
>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>>> index 54bdd4884fe6..afae3ab64c9e 100644
>>>> --- a/fs/ext4/inode.c
>>>> +++ b/fs/ext4/inode.c
>>>> @@ -1138,6 +1138,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
>>>> int ret, needed_blocks;
>>>> handle_t *handle;
>>>> int retries = 0;
>>>> + fgf_t fgp_flags;
>>>> struct folio *folio;
>>>> pgoff_t index;
>>>> unsigned from, to;
>>>> @@ -1164,6 +1165,15 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
>>>> return 0;
>>>> }
>>>>
>>>> + /*
>>>> + * Set FGP_WRITEBEGIN, and FGP_UNCACHED if foliop contains
>>>> + * foliop_uncached. That's how generic_perform_write() informs us
>>>> + * that this is an uncached write.
>>>> + */
>>>> + fgp_flags = FGP_WRITEBEGIN;
>>>> + if (*foliop == foliop_uncached)
>>>> + fgp_flags |= FGP_UNCACHED;
>>>> +
>>>> /*
>>>> * __filemap_get_folio() can take a long time if the
>>>> * system is thrashing due to memory pressure, or if the folio
>>>> @@ -1172,7 +1182,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
>>>> * the folio (if needed) without using GFP_NOFS.
>>>> */
>>>> retry_grab:
>>>> - folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
>>>> + folio = __filemap_get_folio(mapping, index, fgp_flags,
>>>> mapping_gfp_mask(mapping));
>>>> if (IS_ERR(folio))
>>>> return PTR_ERR(folio);
>>>
>>> JFYI, I notice that ext4 cycles the folio lock here in this path and
>>> thus follows up with a couple checks presumably to accommodate that. One
>>> is whether i_mapping has changed, which I assume means uncached state
>>> would have been handled/cleared externally somewhere..? I.e., if an
>>> uncached folio is somehow truncated/freed without ever having been
>>> written back?
>>>
>>> The next is a folio_wait_stable() call "in case writeback began ..."
>>> It's not immediately clear to me if that is possible here, but taking
>>> that at face value, is it an issue if we were to create an uncached
>>> folio, drop the folio lock, then have some other task dirty and
>>> writeback the folio (due to a sync write or something), then have
>>> writeback completion invalidate the folio before we relock it here?
>>
>> I don't either of those are an issue. The UNCACHED flag will only be set
>> on a newly created folio, it does not get inherited for folios that
>> already exist.
>>
>
> Right.. but what I was wondering for that latter case is if the folio is
> created here by ext4, so uncached is set before it is unlocked.
>
> On second look I guess the uncached completion invalidation should clear
> mapping and thus trigger the retry logic here. That seems reasonable
> enough, but is it still possible to race with writeback?
>
> Maybe this is a better way to ask.. what happens if a write completes to
> an uncached folio that is already under writeback? For example, uncached
> write 1 completes, submits for writeback and returns to userspace. Then
> write 2 begins and redirties the same folio before the uncached
> writeback completes.
>
> If I follow correctly, if write 2 is also uncached, it eventually blocks
> in writeback submission (folio_prepare_writeback() ->
> folio_wait_writeback()). It looks like folio lock is held there, so
> presumably that would bypass the completion time invalidation in
> folio_end_uncached(). But what if write 2 was not uncached or perhaps
> writeback completion won the race for folio lock vs. the write side
> (between locking the folio for dirtying and later for writeback
> submission)? Does anything prevent invalidation of the folio before the
> second write is submitted for writeback?
>
> IOW, I'm wondering if the uncached completion time invalidation also
> needs a folio dirty check..?
Ah ok, I see what you mean. If the folio is dirty, the unmapping will
fail. But I guess with the recent change, we'll actually unmap it first.
I'll add the folio dirty check, thanks!
--
Jens Axboe
Powered by blists - more mailing lists