[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9F4FC13E-E353-4A4F-BEB7-767CF4164854@nvidia.com>
Date: Wed, 15 Oct 2025 18:57:37 -0400
From: Zi Yan <ziy@...dia.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: linmiaohe@...wei.com, david@...hat.com, jane.chu@...cle.com,
kernel@...kajraghav.com,
syzbot+e6367ea2fdab6ed46056@...kaller.appspotmail.com,
syzkaller-bugs@...glegroups.com, akpm@...ux-foundation.org,
mcgrof@...nel.org, nao.horiguchi@...il.com,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Nico Pache <npache@...hat.com>,
Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
Barry Song <baohua@...nel.org>, Lance Yang <lance.yang@...ux.dev>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*()
target order silently.
On 15 Oct 2025, at 10:25, Lorenzo Stoakes wrote:
> On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
>> Page cache folios from a file system that support large block size (LBS)
>> can have minimal folio order greater than 0, thus a high order folio might
>> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
>> folio in minimum folio order chunks") bumps the target order of
>> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
>> This causes confusion for some split_huge_page*() callers like memory
>> failure handling code, since they expect after-split folios all have
>> order-0 when split succeeds but in really get min_order_for_split() order
>> folios.
>>
>> Fix it by failing a split if the folio cannot be split to the target order.
>>
>> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
>> [The test poisons LBS folios, which cannot be split to order-0 folios, and
>> also tries to poison all memory. The non split LBS folios take more memory
>> than the test anticipated, leading to OOM. The patch fixed the kernel
>> warning and the test needs some change to avoid OOM.]
>> Reported-by: syzbot+e6367ea2fdab6ed46056@...kaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/
>> Signed-off-by: Zi Yan <ziy@...dia.com>
>
> Generally ok with the patch in general but a bunch of comments below!
>
>> ---
>> include/linux/huge_mm.h | 28 +++++-----------------------
>> mm/huge_memory.c | 9 +--------
>> mm/truncate.c | 6 ++++--
>> 3 files changed, 10 insertions(+), 33 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 8eec7a2a977b..9950cda1526a 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
>> * Return: 0: split is successful, otherwise split failed.
>> */
>
> You need to update the kdoc too.
Done it locally.
>
> Also can you mention there this is the function you should use if you want
> to specify an order?
You mean min_order_for_split()? Sure.
>
> Maybe we should rename this function to try_folio_split_to_order() to make
> that completely explicit now that we're making other splitting logic always
> split to order-0?
Sure.
>
>> static inline int try_folio_split(struct folio *folio, struct page *page,
>> - struct list_head *list)
>> + struct list_head *list, unsigned int order)
>
> Is this target order? I see non_uniform_split_supported() calls this
> new_order so maybe let's use the same naming so as not to confuse it with
> the current folio order?
Sure, will rename it to new_order.
>
> Also - nitty one, but should we put the order as 3rd arg rather than 4th?
>
> As it seems it's normal to pass NULL list, and it's a bit weird to see a
> NULL in the middle of the args.
OK, will reorder the args.
>
>> {
>> - int ret = min_order_for_split(folio);
>> -
>> - if (ret < 0)
>> - return ret;
>
> OK so the point of removing this is that we assume in truncate (the only
> user) that we already have this information (i.e. from
> mapping_min_folio_order()) right?
Right.
>
>> -
>> - if (!non_uniform_split_supported(folio, 0, false))
>> + if (!non_uniform_split_supported(folio, order, false))
>
> While we're here can we make the mystery meat last param commented like:
>
> if (!non_uniform_split_supported(folio, order, /* warns= */false))
Sure.
>
>> return split_huge_page_to_list_to_order(&folio->page, list,
>> - ret);
>> - return folio_split(folio, ret, page, list);
>> + order);
>> + return folio_split(folio, order, page, list);
>> }
>> static inline int split_huge_page(struct page *page)
>> {
>> - struct folio *folio = page_folio(page);
>> - int ret = min_order_for_split(folio);
>> -
>> - if (ret < 0)
>> - return ret;
>> -
>> - /*
>> - * split_huge_page() locks the page before splitting and
>> - * expects the same page that has been split to be locked when
>> - * returned. split_folio(page_folio(page)) cannot be used here
>> - * because it converts the page to folio and passes the head
>> - * page to be split.
>> - */
>> - return split_huge_page_to_list_to_order(page, NULL, ret);
>> + return split_huge_page_to_list_to_order(page, NULL, 0);
>
> OK so the idea here is that callers would expect to split to 0 and the
> specific instance where we would actually want this behaviour of splittnig
> to a minimum order is now limited only to try_folio_split() (or
> try_folio_split_to_order() if you rename)?
>
Before commit e220917fa507 (the one to be fixed), split_huge_page() always
splits @page to order 0. It is just restoring the original behavior.
If caller wants to split a different order, they should use
split_huge_page_to_list_to_order() (current no such user except debugfs test
code).
>> }
>> void deferred_split_folio(struct folio *folio, bool partially_mapped);
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 0fb4af604657..af06ee6d2206 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3829,8 +3829,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>
>> min_order = mapping_min_folio_order(folio->mapping);
>> if (new_order < min_order) {
>> - VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u",
>> - min_order);
>
> Why are we dropping this?
This is used to catch “misuse” of split_huge_page_to_list_to_order(), when
caller wants to split a LBS folio to an order smaller than
mapping_min_folio_order(). It is based on the assumption that split code
should never fail on a LBS folio. But that assumption is causing problems
like the reported memory failure one. So it is removed to allow split code
to fail without a warning if a LBS folio cannot be split to the new_order.
>
>> ret = -EINVAL;
>> goto out;
>> }
>
>> @@ -4173,12 +4171,7 @@ int min_order_for_split(struct folio *folio)
>>
>> int split_folio_to_list(struct folio *folio, struct list_head *list)
>> {
>> - int ret = min_order_for_split(folio);
>> -
>> - if (ret < 0)
>> - return ret;
>> -
>> - return split_huge_page_to_list_to_order(&folio->page, list, ret);
>> + return split_huge_page_to_list_to_order(&folio->page, list, 0);
>> }
>>
>> /*
>> diff --git a/mm/truncate.c b/mm/truncate.c
>> index 91eb92a5ce4f..1c15149ae8e9 100644
>> --- a/mm/truncate.c
>> +++ b/mm/truncate.c
>> @@ -194,6 +194,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>> size_t size = folio_size(folio);
>> unsigned int offset, length;
>> struct page *split_at, *split_at2;
>> + unsigned int min_order;
>>
>> if (pos < start)
>> offset = start - pos;
>> @@ -223,8 +224,9 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>> if (!folio_test_large(folio))
>> return true;
>>
>> + min_order = mapping_min_folio_order(folio->mapping);
>> split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
>> - if (!try_folio_split(folio, split_at, NULL)) {
>> + if (!try_folio_split(folio, split_at, NULL, min_order)) {
>> /*
>> * try to split at offset + length to make sure folios within
>> * the range can be dropped, especially to avoid memory waste
>> @@ -254,7 +256,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>> */
>> if (folio_test_large(folio2) &&
>> folio2->mapping == folio->mapping)
>> - try_folio_split(folio2, split_at2, NULL);
>> + try_folio_split(folio2, split_at2, NULL, min_order);
>>
>> folio_unlock(folio2);
>> out:
>> --
>> 2.51.0
>>
Best Regards,
Yan, Zi
Powered by blists - more mailing lists