[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yru7qf5gvyzccq5ohhpylvxug5lr5tf54omspbjh4sm6pcdb2r@fpjgj2pxw7va>
Date: Tue, 15 Jul 2025 00:34:12 +0200
From: Klara Modin <klarasmodin@...il.com>
To: Youling Tang <youling.tang@...ux.dev>
Cc: Matthew Wilcox <willy@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>, Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, chizhiling@....com,
Youling Tang <tangyouling@...inos.cn>, Chi Zhiling <chizhiling@...inos.cn>
Subject: Re: [PATCH] mm/filemap: Align last_index to folio size
Hi,
On 2025-07-11 13:55:09 +0800, Youling Tang wrote:
> From: Youling Tang <tangyouling@...inos.cn>
>
> On XFS systems with pagesize=4K, blocksize=16K, and CONFIG_TRANSPARENT_HUGEPAGE
> enabled, We observed the following readahead behaviors:
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=test of=/dev/null bs=64k count=1
> # ./tools/mm/page-types -r -L -f /mnt/xfs/test
> foffset offset flags
> 0 136d4c __RU_l_________H______t_________________F_1
> 1 136d4d __RU_l__________T_____t_________________F_1
> 2 136d4e __RU_l__________T_____t_________________F_1
> 3 136d4f __RU_l__________T_____t_________________F_1
> ...
> c 136bb8 __RU_l_________H______t_________________F_1
> d 136bb9 __RU_l__________T_____t_________________F_1
> e 136bba __RU_l__________T_____t_________________F_1
> f 136bbb __RU_l__________T_____t_________________F_1 <-- first read
> 10 13c2cc ___U_l_________H______t______________I__F_1 <-- readahead flag
> 11 13c2cd ___U_l__________T_____t______________I__F_1
> 12 13c2ce ___U_l__________T_____t______________I__F_1
> 13 13c2cf ___U_l__________T_____t______________I__F_1
> ...
> 1c 1405d4 ___U_l_________H______t_________________F_1
> 1d 1405d5 ___U_l__________T_____t_________________F_1
> 1e 1405d6 ___U_l__________T_____t_________________F_1
> 1f 1405d7 ___U_l__________T_____t_________________F_1
> [ra_size = 32, req_count = 16, async_size = 16]
>
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=test of=/dev/null bs=60k count=1
> # ./page-types -r -L -f /mnt/xfs/test
> foffset offset flags
> 0 136048 __RU_l_________H______t_________________F_1
> ...
> c 110a40 __RU_l_________H______t_________________F_1
> d 110a41 __RU_l__________T_____t_________________F_1
> e 110a42 __RU_l__________T_____t_________________F_1 <-- first read
> f 110a43 __RU_l__________T_____t_________________F_1 <-- first readahead flag
> 10 13e7a8 ___U_l_________H______t_________________F_1
> ...
> 20 137a00 ___U_l_________H______t_______P______I__F_1 <-- second readahead flag (20 - 2f)
> 21 137a01 ___U_l__________T_____t_______P______I__F_1
> ...
> 3f 10d4af ___U_l__________T_____t_______P_________F_1
> [first readahead: ra_size = 32, req_count = 15, async_size = 17]
>
> When reading 64k data (same for 61-63k range, where last_index is page-aligned
> in filemap_get_pages()), 128k readahead is triggered via page_cache_sync_ra()
> and the PG_readahead flag is set on the next folio (the one containing 0x10 page).
>
> When reading 60k data, 128k readahead is also triggered via page_cache_sync_ra().
> However, in this case the readahead flag is set on the 0xf page. Although the
> requested read size (req_count) is 60k, the actual read will be aligned to
> folio size (64k), which triggers the readahead flag and initiates asynchronous
> readahead via page_cache_async_ra(). This results in two readahead operations
> totaling 256k.
>
> The root cause is that when the requested size is smaller than the actual read
> size (due to folio alignment), it triggers asynchronous readahead. By changing
> last_index alignment from page size to folio size, we ensure the requested size
> matches the actual read size, preventing the case where a single read operation
> triggers two readahead operations.
>
> After applying the patch:
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=test of=/dev/null bs=60k count=1
> # ./page-types -r -L -f /mnt/xfs/test
> foffset offset flags
> 0 136d4c __RU_l_________H______t_________________F_1
> 1 136d4d __RU_l__________T_____t_________________F_1
> 2 136d4e __RU_l__________T_____t_________________F_1
> 3 136d4f __RU_l__________T_____t_________________F_1
> ...
> c 136bb8 __RU_l_________H______t_________________F_1
> d 136bb9 __RU_l__________T_____t_________________F_1
> e 136bba __RU_l__________T_____t_________________F_1 <-- first read
> f 136bbb __RU_l__________T_____t_________________F_1
> 10 13c2cc ___U_l_________H______t______________I__F_1 <-- readahead flag
> 11 13c2cd ___U_l__________T_____t______________I__F_1
> 12 13c2ce ___U_l__________T_____t______________I__F_1
> 13 13c2cf ___U_l__________T_____t______________I__F_1
> ...
> 1c 1405d4 ___U_l_________H______t_________________F_1
> 1d 1405d5 ___U_l__________T_____t_________________F_1
> 1e 1405d6 ___U_l__________T_____t_________________F_1
> 1f 1405d7 ___U_l__________T_____t_________________F_1
> [ra_size = 32, req_count = 16, async_size = 16]
>
> The same phenomenon will occur when reading from 49k to 64k. Set the readahead
> flag to the next folio.
>
> Because the minimum order of folio in address_space equals the block size (at
> least in xfs and bcachefs that already support bs > ps), having request_count
> aligned to block size will not cause overread.
>
> Co-developed-by: Chi Zhiling <chizhiling@...inos.cn>
> Signed-off-by: Chi Zhiling <chizhiling@...inos.cn>
> Signed-off-by: Youling Tang <tangyouling@...inos.cn>
I bisected boot failures on two of my 32-bit systems to this patch.
> ---
> include/linux/pagemap.h | 6 ++++++
> mm/filemap.c | 5 +++--
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index e63fbfbd5b0f..447bb264fd94 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -480,6 +480,12 @@ mapping_min_folio_nrpages(struct address_space *mapping)
> return 1UL << mapping_min_folio_order(mapping);
> }
>
> +static inline unsigned long
> +mapping_min_folio_nrbytes(struct address_space *mapping)
> +{
> + return mapping_min_folio_nrpages(mapping) << PAGE_SHIFT;
> +}
> +
> /**
> * mapping_align_index() - Align index for this mapping.
> * @mapping: The address_space.
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 765dc5ef6d5a..56a8656b6f86 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2584,8 +2584,9 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
> unsigned int flags;
> int err = 0;
>
> - /* "last_index" is the index of the page beyond the end of the read */
> - last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE);
> + /* "last_index" is the index of the folio beyond the end of the read */
> + last_index = round_up(iocb->ki_pos + count, mapping_min_folio_nrbytes(mapping));
iocb->ki_pos is loff_t (long long) while pgoff_t is unsigned long and
this overflow seems to happen in practice, resulting in last_index being
before index.
The following diff resolves the issue for me:
diff --git a/mm/filemap.c b/mm/filemap.c
index 3c071307f40e..d2902be0b845 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2585,8 +2585,8 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
int err = 0;
/* "last_index" is the index of the folio beyond the end of the read */
- last_index = round_up(iocb->ki_pos + count, mapping_min_folio_nrbytes(mapping));
- last_index >>= PAGE_SHIFT;
+ last_index = round_up(iocb->ki_pos + count,
+ mapping_min_folio_nrbytes(mapping)) >> PAGE_SHIFT;
retry:
if (fatal_signal_pending(current))
return -EINTR;
Regards,
Klara Modin
> + last_index >>= PAGE_SHIFT;
> retry:
> if (fatal_signal_pending(current))
> return -EINTR;
> --
> 2.34.1
>
Powered by blists - more mailing lists