linux-kernel - Re: [PATCH] mm/filemap: Align last

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e80c3fdd-782e-4857-810e-5b7384448154@redhat.com>
Date: Fri, 11 Jul 2025 18:08:53 +0200
From: David Hildenbrand <david@...hat.com>
To: Youling Tang <youling.tang@...ux.dev>,
 Matthew Wilcox <willy@...radead.org>,
 Andrew Morton <akpm@...ux-foundation.org>
Cc: Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org, chizhiling@....com,
 Youling Tang <tangyouling@...inos.cn>, Chi Zhiling <chizhiling@...inos.cn>,
 Ryan Roberts <ryan.roberts@....com>
Subject: Re: [PATCH] mm/filemap: Align last_index to folio size

CCing Ryan, who recently fiddled with readahead.


On 11.07.25 07:55, Youling Tang wrote:
> From: Youling Tang <tangyouling@...inos.cn>
> 
> On XFS systems with pagesize=4K, blocksize=16K, and CONFIG_TRANSPARENT_HUGEPAGE
> enabled, We observed the following readahead behaviors:
>   # echo 3 > /proc/sys/vm/drop_caches
>   # dd if=test of=/dev/null bs=64k count=1
>   # ./tools/mm/page-types -r -L -f  /mnt/xfs/test
>   foffset	offset	flags
>   0	136d4c	__RU_l_________H______t_________________F_1
>   1	136d4d	__RU_l__________T_____t_________________F_1
>   2	136d4e	__RU_l__________T_____t_________________F_1
>   3	136d4f	__RU_l__________T_____t_________________F_1
>   ...
>   c	136bb8	__RU_l_________H______t_________________F_1
>   d	136bb9	__RU_l__________T_____t_________________F_1
>   e	136bba	__RU_l__________T_____t_________________F_1
>   f	136bbb	__RU_l__________T_____t_________________F_1   <-- first read
>   10	13c2cc	___U_l_________H______t______________I__F_1   <-- readahead flag
>   11	13c2cd	___U_l__________T_____t______________I__F_1
>   12	13c2ce	___U_l__________T_____t______________I__F_1
>   13	13c2cf	___U_l__________T_____t______________I__F_1
>   ...
>   1c	1405d4	___U_l_________H______t_________________F_1
>   1d	1405d5	___U_l__________T_____t_________________F_1
>   1e	1405d6	___U_l__________T_____t_________________F_1
>   1f	1405d7	___U_l__________T_____t_________________F_1
>   [ra_size = 32, req_count = 16, async_size = 16]
> 
>   # echo 3 > /proc/sys/vm/drop_caches
>   # dd if=test of=/dev/null bs=60k count=1
>   # ./page-types -r -L -f  /mnt/xfs/test
>   foffset	offset	flags
>   0	136048	__RU_l_________H______t_________________F_1
>   ...
>   c	110a40	__RU_l_________H______t_________________F_1
>   d	110a41	__RU_l__________T_____t_________________F_1
>   e	110a42	__RU_l__________T_____t_________________F_1   <-- first read
>   f	110a43	__RU_l__________T_____t_________________F_1   <-- first readahead flag
>   10	13e7a8	___U_l_________H______t_________________F_1
>   ...
>   20	137a00	___U_l_________H______t_______P______I__F_1   <-- second readahead flag (20 - 2f)
>   21	137a01	___U_l__________T_____t_______P______I__F_1
>   ...
>   3f	10d4af	___U_l__________T_____t_______P_________F_1
>   [first readahead: ra_size = 32, req_count = 15, async_size = 17]
> 
> When reading 64k data (same for 61-63k range, where last_index is page-aligned
> in filemap_get_pages()), 128k readahead is triggered via page_cache_sync_ra()
> and the PG_readahead flag is set on the next folio (the one containing 0x10 page).
> 
> When reading 60k data, 128k readahead is also triggered via page_cache_sync_ra().
> However, in this case the readahead flag is set on the 0xf page. Although the
> requested read size (req_count) is 60k, the actual read will be aligned to
> folio size (64k), which triggers the readahead flag and initiates asynchronous
> readahead via page_cache_async_ra(). This results in two readahead operations
> totaling 256k.
> 
> The root cause is that when the requested size is smaller than the actual read
> size (due to folio alignment), it triggers asynchronous readahead. By changing
> last_index alignment from page size to folio size, we ensure the requested size
> matches the actual read size, preventing the case where a single read operation
> triggers two readahead operations.
> 
> After applying the patch:
>   # echo 3 > /proc/sys/vm/drop_caches
>   # dd if=test of=/dev/null bs=60k count=1
>   # ./page-types -r -L -f  /mnt/xfs/test
>   foffset	offset	flags
>   0	136d4c	__RU_l_________H______t_________________F_1
>   1	136d4d	__RU_l__________T_____t_________________F_1
>   2	136d4e	__RU_l__________T_____t_________________F_1
>   3	136d4f	__RU_l__________T_____t_________________F_1
>   ...
>   c	136bb8	__RU_l_________H______t_________________F_1
>   d	136bb9	__RU_l__________T_____t_________________F_1
>   e	136bba	__RU_l__________T_____t_________________F_1   <-- first read
>   f	136bbb	__RU_l__________T_____t_________________F_1
>   10	13c2cc	___U_l_________H______t______________I__F_1   <-- readahead flag
>   11	13c2cd	___U_l__________T_____t______________I__F_1
>   12	13c2ce	___U_l__________T_____t______________I__F_1
>   13	13c2cf	___U_l__________T_____t______________I__F_1
>   ...
>   1c	1405d4	___U_l_________H______t_________________F_1
>   1d	1405d5	___U_l__________T_____t_________________F_1
>   1e	1405d6	___U_l__________T_____t_________________F_1
>   1f	1405d7	___U_l__________T_____t_________________F_1
>   [ra_size = 32, req_count = 16, async_size = 16]
> 
> The same phenomenon will occur when reading from 49k to 64k. Set the readahead
> flag to the next folio.
> 
> Because the minimum order of folio in address_space equals the block size (at
> least in xfs and bcachefs that already support bs > ps), having request_count
> aligned to block size will not cause overread.
> 
> Co-developed-by: Chi Zhiling <chizhiling@...inos.cn>
> Signed-off-by: Chi Zhiling <chizhiling@...inos.cn>
> Signed-off-by: Youling Tang <tangyouling@...inos.cn>
> ---
>   include/linux/pagemap.h | 6 ++++++
>   mm/filemap.c            | 5 +++--
>   2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index e63fbfbd5b0f..447bb264fd94 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -480,6 +480,12 @@ mapping_min_folio_nrpages(struct address_space *mapping)
>   	return 1UL << mapping_min_folio_order(mapping);
>   }
>   
> +static inline unsigned long
> +mapping_min_folio_nrbytes(struct address_space *mapping)
> +{
> +	return mapping_min_folio_nrpages(mapping) << PAGE_SHIFT;
> +}
> +
>   /**
>    * mapping_align_index() - Align index for this mapping.
>    * @mapping: The address_space.
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 765dc5ef6d5a..56a8656b6f86 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2584,8 +2584,9 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
>   	unsigned int flags;
>   	int err = 0;
>   
> -	/* "last_index" is the index of the page beyond the end of the read */
> -	last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE);
> +	/* "last_index" is the index of the folio beyond the end of the read */
> +	last_index = round_up(iocb->ki_pos + count, mapping_min_folio_nrbytes(mapping));
> +	last_index >>= PAGE_SHIFT;
>   retry:
>   	if (fatal_signal_pending(current))
>   		return -EINTR;


-- 
Cheers,

David / dhildenb