linux-kernel - Re: [PATCH 1/2] mm/huge_memory: do not change split_huge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d7243ce2-2e32-4bc9-8a00-9e69d839d240@lucifer.local>
Date: Wed, 15 Oct 2025 15:25:59 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Zi Yan <ziy@...dia.com>
Cc: linmiaohe@...wei.com, david@...hat.com, jane.chu@...cle.com,
        kernel@...kajraghav.com,
        syzbot+e6367ea2fdab6ed46056@...kaller.appspotmail.com,
        syzkaller-bugs@...glegroups.com, akpm@...ux-foundation.org,
        mcgrof@...nel.org, nao.horiguchi@...il.com,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
        Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
        Lance Yang <lance.yang@...ux.dev>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*()
 target order silently.

On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
> Page cache folios from a file system that support large block size (LBS)
> can have minimal folio order greater than 0, thus a high order folio might
> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
> folio in minimum folio order chunks") bumps the target order of
> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
> This causes confusion for some split_huge_page*() callers like memory
> failure handling code, since they expect after-split folios all have
> order-0 when split succeeds but in really get min_order_for_split() order
> folios.
>
> Fix it by failing a split if the folio cannot be split to the target order.
>
> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
> [The test poisons LBS folios, which cannot be split to order-0 folios, and
> also tries to poison all memory. The non split LBS folios take more memory
> than the test anticipated, leading to OOM. The patch fixed the kernel
> warning and the test needs some change to avoid OOM.]
> Reported-by: syzbot+e6367ea2fdab6ed46056@...kaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
> Signed-off-by: Zi Yan <ziy@...dia.com>

Generally ok with the patch in general but a bunch of comments below!

> ---
>  include/linux/huge_mm.h | 28 +++++-----------------------
>  mm/huge_memory.c        |  9 +--------
>  mm/truncate.c           |  6 ++++--
>  3 files changed, 10 insertions(+), 33 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 8eec7a2a977b..9950cda1526a 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
>   * Return: 0: split is successful, otherwise split failed.
>   */

You need to update the kdoc too.

Also can you mention there this is the function you should use if you want
to specify an order?

Maybe we should rename this function to try_folio_split_to_order() to make
that completely explicit now that we're making other splitting logic always
split to order-0?

>  static inline int try_folio_split(struct folio *folio, struct page *page,
> -		struct list_head *list)
> +		struct list_head *list, unsigned int order)

Is this target order? I see non_uniform_split_supported() calls this
new_order so maybe let's use the same naming so as not to confuse it with
the current folio order?

Also - nitty one, but should we put the order as 3rd arg rather than 4th?

As it seems it's normal to pass NULL list, and it's a bit weird to see a
NULL in the middle of the args.

>  {
> -	int ret = min_order_for_split(folio);
> -
> -	if (ret < 0)
> -		return ret;

OK so the point of removing this is that we assume in truncate (the only
user) that we already have this information (i.e. from
mapping_min_folio_order()) right?

> -
> -	if (!non_uniform_split_supported(folio, 0, false))
> +	if (!non_uniform_split_supported(folio, order, false))

While we're here can we make the mystery meat last param commented like:

	if (!non_uniform_split_supported(folio, order, /* warns= */false))

>  		return split_huge_page_to_list_to_order(&folio->page, list,
> -				ret);
> -	return folio_split(folio, ret, page, list);
> +				order);
> +	return folio_split(folio, order, page, list);
>  }
>  static inline int split_huge_page(struct page *page)
>  {
> -	struct folio *folio = page_folio(page);
> -	int ret = min_order_for_split(folio);
> -
> -	if (ret < 0)
> -		return ret;
> -
> -	/*
> -	 * split_huge_page() locks the page before splitting and
> -	 * expects the same page that has been split to be locked when
> -	 * returned. split_folio(page_folio(page)) cannot be used here
> -	 * because it converts the page to folio and passes the head
> -	 * page to be split.
> -	 */
> -	return split_huge_page_to_list_to_order(page, NULL, ret);
> +	return split_huge_page_to_list_to_order(page, NULL, 0);

OK so the idea here is that callers would expect to split to 0 and the
specific instance where we would actually want this behaviour of splittnig
to a minimum order is now limited only to try_folio_split() (or
try_folio_split_to_order() if you rename)?

>  }
>  void deferred_split_folio(struct folio *folio, bool partially_mapped);
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 0fb4af604657..af06ee6d2206 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3829,8 +3829,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>
>  		min_order = mapping_min_folio_order(folio->mapping);
>  		if (new_order < min_order) {
> -			VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u",
> -				     min_order);

Why are we dropping this?

>  			ret = -EINVAL;
>  			goto out;
>  		}

> @@ -4173,12 +4171,7 @@ int min_order_for_split(struct folio *folio)
>
>  int split_folio_to_list(struct folio *folio, struct list_head *list)
>  {
> -	int ret = min_order_for_split(folio);
> -
> -	if (ret < 0)
> -		return ret;
> -
> -	return split_huge_page_to_list_to_order(&folio->page, list, ret);
> +	return split_huge_page_to_list_to_order(&folio->page, list, 0);
>  }
>
>  /*
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 91eb92a5ce4f..1c15149ae8e9 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -194,6 +194,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>  	size_t size = folio_size(folio);
>  	unsigned int offset, length;
>  	struct page *split_at, *split_at2;
> +	unsigned int min_order;
>
>  	if (pos < start)
>  		offset = start - pos;
> @@ -223,8 +224,9 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>  	if (!folio_test_large(folio))
>  		return true;
>
> +	min_order = mapping_min_folio_order(folio->mapping);
>  	split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
> -	if (!try_folio_split(folio, split_at, NULL)) {
> +	if (!try_folio_split(folio, split_at, NULL, min_order)) {
>  		/*
>  		 * try to split at offset + length to make sure folios within
>  		 * the range can be dropped, especially to avoid memory waste
> @@ -254,7 +256,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>  		 */
>  		if (folio_test_large(folio2) &&
>  		    folio2->mapping == folio->mapping)
> -			try_folio_split(folio2, split_at2, NULL);
> +			try_folio_split(folio2, split_at2, NULL, min_order);
>
>  		folio_unlock(folio2);
>  out:
> --
> 2.51.0
>