linux-kernel - Re: [PATCH 1/2] mm/huge_memory: do not change split_huge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9F4FC13E-E353-4A4F-BEB7-767CF4164854@nvidia.com>
Date: Wed, 15 Oct 2025 18:57:37 -0400
From: Zi Yan <ziy@...dia.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: linmiaohe@...wei.com, david@...hat.com, jane.chu@...cle.com,
 kernel@...kajraghav.com,
 syzbot+e6367ea2fdab6ed46056@...kaller.appspotmail.com,
 syzkaller-bugs@...glegroups.com, akpm@...ux-foundation.org,
 mcgrof@...nel.org, nao.horiguchi@...il.com,
 Baolin Wang <baolin.wang@...ux.alibaba.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Nico Pache <npache@...hat.com>,
 Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
 Barry Song <baohua@...nel.org>, Lance Yang <lance.yang@...ux.dev>,
 "Matthew Wilcox (Oracle)" <willy@...radead.org>,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org
Subject: Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*()
 target order silently.

On 15 Oct 2025, at 10:25, Lorenzo Stoakes wrote:

> On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
>> Page cache folios from a file system that support large block size (LBS)
>> can have minimal folio order greater than 0, thus a high order folio might
>> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
>> folio in minimum folio order chunks") bumps the target order of
>> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
>> This causes confusion for some split_huge_page*() callers like memory
>> failure handling code, since they expect after-split folios all have
>> order-0 when split succeeds but in really get min_order_for_split() order
>> folios.
>>
>> Fix it by failing a split if the folio cannot be split to the target order.
>>
>> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
>> [The test poisons LBS folios, which cannot be split to order-0 folios, and
>> also tries to poison all memory. The non split LBS folios take more memory
>> than the test anticipated, leading to OOM. The patch fixed the kernel
>> warning and the test needs some change to avoid OOM.]
>> Reported-by: syzbot+e6367ea2fdab6ed46056@...kaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
>> Signed-off-by: Zi Yan <ziy@...dia.com>
>
> Generally ok with the patch in general but a bunch of comments below!
>
>> ---
>>  include/linux/huge_mm.h | 28 +++++-----------------------
>>  mm/huge_memory.c        |  9 +--------
>>  mm/truncate.c           |  6 ++++--
>>  3 files changed, 10 insertions(+), 33 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 8eec7a2a977b..9950cda1526a 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
>>   * Return: 0: split is successful, otherwise split failed.
>>   */
>
> You need to update the kdoc too.

Done it locally.

>
> Also can you mention there this is the function you should use if you want
> to specify an order?

You mean min_order_for_split()? Sure.

>
> Maybe we should rename this function to try_folio_split_to_order() to make
> that completely explicit now that we're making other splitting logic always
> split to order-0?

Sure.
>
>>  static inline int try_folio_split(struct folio *folio, struct page *page,
>> -		struct list_head *list)
>> +		struct list_head *list, unsigned int order)
>
> Is this target order? I see non_uniform_split_supported() calls this
> new_order so maybe let's use the same naming so as not to confuse it with
> the current folio order?

Sure, will rename it to new_order.

>
> Also - nitty one, but should we put the order as 3rd arg rather than 4th?
>
> As it seems it's normal to pass NULL list, and it's a bit weird to see a
> NULL in the middle of the args.

OK, will reorder the args.

>
>>  {
>> -	int ret = min_order_for_split(folio);
>> -
>> -	if (ret < 0)
>> -		return ret;
>
> OK so the point of removing this is that we assume in truncate (the only
> user) that we already have this information (i.e. from
> mapping_min_folio_order()) right?

Right.

>
>> -
>> -	if (!non_uniform_split_supported(folio, 0, false))
>> +	if (!non_uniform_split_supported(folio, order, false))
>
> While we're here can we make the mystery meat last param commented like:
>
> 	if (!non_uniform_split_supported(folio, order, /* warns= */false))

Sure.

>
>>  		return split_huge_page_to_list_to_order(&folio->page, list,
>> -				ret);
>> -	return folio_split(folio, ret, page, list);
>> +				order);
>> +	return folio_split(folio, order, page, list);
>>  }
>>  static inline int split_huge_page(struct page *page)
>>  {
>> -	struct folio *folio = page_folio(page);
>> -	int ret = min_order_for_split(folio);
>> -
>> -	if (ret < 0)
>> -		return ret;
>> -
>> -	/*
>> -	 * split_huge_page() locks the page before splitting and
>> -	 * expects the same page that has been split to be locked when
>> -	 * returned. split_folio(page_folio(page)) cannot be used here
>> -	 * because it converts the page to folio and passes the head
>> -	 * page to be split.
>> -	 */
>> -	return split_huge_page_to_list_to_order(page, NULL, ret);
>> +	return split_huge_page_to_list_to_order(page, NULL, 0);
>
> OK so the idea here is that callers would expect to split to 0 and the
> specific instance where we would actually want this behaviour of splittnig
> to a minimum order is now limited only to try_folio_split() (or
> try_folio_split_to_order() if you rename)?
>

Before commit e220917fa507 (the one to be fixed), split_huge_page() always
splits @page to order 0. It is just restoring the original behavior.
If caller wants to split a different order, they should use
split_huge_page_to_list_to_order() (current no such user except debugfs test
code).

>>  }
>>  void deferred_split_folio(struct folio *folio, bool partially_mapped);
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 0fb4af604657..af06ee6d2206 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3829,8 +3829,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>
>>  		min_order = mapping_min_folio_order(folio->mapping);
>>  		if (new_order < min_order) {
>> -			VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u",
>> -				     min_order);
>
> Why are we dropping this?

This is used to catch “misuse” of split_huge_page_to_list_to_order(), when
caller wants to split a LBS folio to an order smaller than
mapping_min_folio_order(). It is based on the assumption that split code
should never fail on a LBS folio. But that assumption is causing problems
like the reported memory failure one. So it is removed to allow split code
to fail without a warning if a LBS folio cannot be split to the new_order.

>
>>  			ret = -EINVAL;
>>  			goto out;
>>  		}
>
>> @@ -4173,12 +4171,7 @@ int min_order_for_split(struct folio *folio)
>>
>>  int split_folio_to_list(struct folio *folio, struct list_head *list)
>>  {
>> -	int ret = min_order_for_split(folio);
>> -
>> -	if (ret < 0)
>> -		return ret;
>> -
>> -	return split_huge_page_to_list_to_order(&folio->page, list, ret);
>> +	return split_huge_page_to_list_to_order(&folio->page, list, 0);
>>  }
>>
>>  /*
>> diff --git a/mm/truncate.c b/mm/truncate.c
>> index 91eb92a5ce4f..1c15149ae8e9 100644
>> --- a/mm/truncate.c
>> +++ b/mm/truncate.c
>> @@ -194,6 +194,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>>  	size_t size = folio_size(folio);
>>  	unsigned int offset, length;
>>  	struct page *split_at, *split_at2;
>> +	unsigned int min_order;
>>
>>  	if (pos < start)
>>  		offset = start - pos;
>> @@ -223,8 +224,9 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>>  	if (!folio_test_large(folio))
>>  		return true;
>>
>> +	min_order = mapping_min_folio_order(folio->mapping);
>>  	split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
>> -	if (!try_folio_split(folio, split_at, NULL)) {
>> +	if (!try_folio_split(folio, split_at, NULL, min_order)) {
>>  		/*
>>  		 * try to split at offset + length to make sure folios within
>>  		 * the range can be dropped, especially to avoid memory waste
>> @@ -254,7 +256,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>>  		 */
>>  		if (folio_test_large(folio2) &&
>>  		    folio2->mapping == folio->mapping)
>> -			try_folio_split(folio2, split_at2, NULL);
>> +			try_folio_split(folio2, split_at2, NULL, min_order);
>>
>>  		folio_unlock(folio2);
>>  out:
>> --
>> 2.51.0
>>


Best Regards,
Yan, Zi