[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86CBD3DE-2245-4C79-BDA3-4977548898E3@nvidia.com>
Date: Thu, 20 Nov 2025 10:00:55 -0500
From: Zi Yan <ziy@...dia.com>
To: Balbir Singh <balbirs@...dia.com>
Cc: David Hildenbrand <david@...nel.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Nico Pache <npache@...hat.com>,
Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
Barry Song <baohua@...nel.org>, Lance Yang <lance.yang@...ux.dev>,
Miaohe Lin <linmiaohe@...wei.com>, Naoya Horiguchi <nao.horiguchi@...il.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 3/3] mm/memory-failure: handle min_order_for_split()
error code properly
On 19 Nov 2025, at 23:45, Balbir Singh wrote:
> On 11/20/25 14:59, Zi Yan wrote:
>> min_order_for_split() returns -EBUSY when the folio is truncated and cannot
>> be split. In commit 77008e1b2ef7 ("mm/huge_memory: do not change
>> split_huge_page*() target order silently"), memory_failure() does not
>> handle it and pass -EBUSY to try_to_split_thp_page() directly.
>> try_to_split_thp_page() returns -EINVAL since -EBUSY becomes 0xfffffff0 as
>> new_order is unsigned int in __folio_split() and this large new_order is
>> rejected as an invalid input. The code does not cause a bug.
>> soft_offline_in_use_page() also uses min_order_for_split() but it always
>> passes 0 as new_order for split.
>>
>> Handle it properly by checking min_order_for_split() return value and not
>> calling try_to_split_thp_page() if the value is negative. Add a comment
>> in soft_offline_in_use_page() to clarify the possible negative new_order
>> value.
>>
>> Signed-off-by: Zi Yan <ziy@...dia.com>
>> ---
>> mm/memory-failure.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 7f908ad795ad..86582f030159 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2437,8 +2437,11 @@ int memory_failure(unsigned long pfn, int flags)
>> * or unhandlable page. The refcount is bumped iff the
>> * page is a valid handlable page.
>> */
>> - folio_set_has_hwpoisoned(folio);
>> - err = try_to_split_thp_page(p, new_order, /* release= */ false);
>> + if (new_order >= 0) {
>> + folio_set_has_hwpoisoned(folio);
>
> if new_order < 0, do we skip setting hwpoisioned bit on the folio?
The bit should be set. Anyway, I am going to take David’s approach to
change min_order_for_split().
Thanks.
>
>> + err = try_to_split_thp_page(p, new_order, /* release= */ false);
>> + } else
>> + err = new_order;
>> /*
>> * If splitting a folio to order-0 fails, kill the process.
>> * Split the folio regardless to minimize unusable pages.
>> @@ -2779,6 +2782,7 @@ static int soft_offline_in_use_page(struct page *page)
>> /*
>> * If new_order (target split order) is not 0, do not split the
>> * folio at all to retain the still accessible large folio.
>> + * new_order can be -EBUSY, meaning the folio cannot be split.
>> * NOTE: if minimizing the number of soft offline pages is
>> * preferred, split it to non-zero new_order like it is done in
>> * memory_failure().
>
> Balbir
Best Regards,
Yan, Zi
Powered by blists - more mailing lists