[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87h649trof.fsf@gmail.com>
Date: Tue, 04 Mar 2025 08:42:32 +0530
From: Ritesh Harjani (IBM) <ritesh.list@...il.com>
To: Jens Axboe <axboe@...nel.dk>, "Darrick J . Wong" <djwong@...nel.org>,
Cc: hannes@...xchg.org, clm@...a.com, linux-kernel@...r.kernel.org, willy@...radead.org, kirill@...temov.name, bfoster@...hat.com, Jens Axboe <axboe@...nel.dk>, linux-mm@...ck.org, linux-fsdevel@...r.kernel.org, linux-xfs@...r.kernel.org, fstests@...r.kernel.org, Ritesh Harjani (IBM) <ritesh.list@...il.com>
Subject: Re: [PATCH 09/12] mm/filemap: drop streaming/uncached pages when writeback completes
Jens Axboe <axboe@...nel.dk> writes:
> If the folio is marked as streaming, drop pages when writeback completes.
> Intended to be used with RWF_DONTCACHE, to avoid needing sync writes for
> uncached IO.
>
> Signed-off-by: Jens Axboe <axboe@...nel.dk>
> ---
> mm/filemap.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index dd563208d09d..aa0b3af6533d 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1599,6 +1599,27 @@ int folio_wait_private_2_killable(struct folio *folio)
> }
> EXPORT_SYMBOL(folio_wait_private_2_killable);
>
> +/*
> + * If folio was marked as dropbehind, then pages should be dropped when writeback
> + * completes. Do that now. If we fail, it's likely because of a big folio -
> + * just reset dropbehind for that case and latter completions should invalidate.
> + */
> +static void folio_end_dropbehind_write(struct folio *folio)
> +{
> + /*
> + * Hitting !in_task() should not happen off RWF_DONTCACHE writeback,
> + * but can happen if normal writeback just happens to find dirty folios
> + * that were created as part of uncached writeback, and that writeback
> + * would otherwise not need non-IRQ handling. Just skip the
> + * invalidation in that case.
> + */
> + if (in_task() && folio_trylock(folio)) {
> + if (folio->mapping)
> + folio_unmap_invalidate(folio->mapping, folio, 0);
> + folio_unlock(folio);
> + }
> +}
> +
Hi Jens,
Want to ensure that my understanding is correct here w.r.t the above
function where we call folio_unmap_invalidate() only when in_task() is
true.
Almost always the writeback completion will run in the softirq
completion context right? Do you know of cases where the writeback
completion runs in the process context (in_task())? Few cases from the
filesystem side, where the completion can run in a process context are,
when the bio->bi_end_io is hijacked by the filesystem, e.g.
/* send ioends that might require a transaction to the completion wq */
if (xfs_ioend_is_append(ioend) ||
(ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED)))
ioend->io_bio.bi_end_io = xfs_end_bio;
if (status)
return status;
submit_bio(&ioend->io_bio);
In this case xfs_end_bio() will be called for completing the ioends and
will queue the completion to a workqueue (since this requires txn
processing). (Which means in_task() will be true)
That means the user visible effect of having an in_task() check for
calling folio_unmap_invalidate() are -
1. If an append write or write to a unwritten extent or when cow is done
using RWF_DONTCACHE on a file, it will free up the page cache folios
after the I/O completes (as expected).
2. However if RWF_DONTCACHE writes are done to any existing written extents
(i.e. overwrites), then it will not free up the page cache folios after
the I/O completes (because the completions run in the softirq context)
Is this understanding correct? Is this an expected behavior too and is
this documented someplace like in a man page or kernel Documentation?
The other thing which I wanted to check was, we can then have folios
still in the page cache which were written using RWF_DONTCACHE. So do
we drop these page cache folios later from somewhere else? Or is there
any other mm subsystem utilizing the fact that those pages remaining in
the pagecache which were written using RWF_DONTCACHE can be the
candidates for removal first?
I am just trying to better understand on whether we are treating these
page cache folios marked with PG_dropbehind as special or just as a
regular page cache folios after the I/O is complete but these didn't get
free. I understand the other reason where these won't get freed is when
someone else might be using these folios.
<Below experiment to show page cache pages cached after doing an
overwrite using RWF_DONTCACHE>
I was trying to add -U flag to xfs_io for uncached preadv2/pwritev2
calls. (I will post those patches soon).
Using -U flag in xfs_io, we can see the above mentioned behavior.
e.g.
# mount /dev/loop6 /mnt1/test
// Do uncached writes using (-U) to a new file
# ./io/xfs_io -fc "pwrite -U -V 1 0 16K" /mnt1/test/f1;
wrote 16384/16384 bytes at offset 0
16 KiB, 4 ops; 0.0036 sec (4.277 MiB/sec and 1094.9904 ops/sec)
//no page cache pages found as expected
# ./io/xfs_io -c "mmap 0 16K" -c "mincore" -c "munmap" /mnt1/test/f1
// overwrite to the same area using uncached writes (-U)
# ./io/xfs_io -c "pwrite -U -V 1 0 16K" /mnt1/test/f1;
wrote 16384/16384 bytes at offset 0
16 KiB, 4 ops; 0.0016 sec (9.340 MiB/sec and 2390.9145 ops/sec)
// Overwrite causes the page cache pages to be found even with -U
# ./io/xfs_io -c "mmap 0 16K" -c "mincore" -c "munmap" /mnt1/test/f1
0x7ffff7fb5000 - 0x7ffff7fb9000 4 pages (0 : 16384)
// Try the same after a mount cycle
# umount /dev/loop6
# mount /dev/loop6 /mnt1/test
// Overwrite causes the page cache pages to be found even with -U
# ./io/xfs_io -c "pwrite -U -V 1 0 16K" /mnt1/test/f1;
wrote 16384/16384 bytes at offset 0
16 KiB, 4 ops; 0.0009 sec (16.361 MiB/sec and 4188.4817 ops/sec)
# ./io/xfs_io -c "mmap 0 16K" -c "mincore" -c "munmap" /mnt1/test/f1
0x7ffff7fb5000 - 0x7ffff7fb9000 4 pages (0 : 16384)
#
Should we add few unit tests in xfstests for above behavior (for
preadv2() and pwritev2()). I was planning to add a few, but since we are
already discussing other things here - it's better to get an opinion
from others too for this.
-ritesh
> /**
> * folio_end_writeback - End writeback against a folio.
> * @folio: The folio.
> @@ -1609,6 +1630,8 @@ EXPORT_SYMBOL(folio_wait_private_2_killable);
> */
> void folio_end_writeback(struct folio *folio)
> {
> + bool folio_dropbehind = false;
> +
> VM_BUG_ON_FOLIO(!folio_test_writeback(folio), folio);
>
> /*
> @@ -1630,9 +1653,14 @@ void folio_end_writeback(struct folio *folio)
> * reused before the folio_wake_bit().
> */
> folio_get(folio);
> + if (!folio_test_dirty(folio))
> + folio_dropbehind = folio_test_clear_dropbehind(folio);
> if (__folio_end_writeback(folio))
> folio_wake_bit(folio, PG_writeback);
> acct_reclaim_writeback(folio);
> +
> + if (folio_dropbehind)
> + folio_end_dropbehind_write(folio);
> folio_put(folio);
> }
> EXPORT_SYMBOL(folio_end_writeback);
> --
> 2.45.2
Powered by blists - more mailing lists