[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1aa5818f-eb75-4aee-a866-9d2f81111056@redhat.com>
Date: Fri, 5 Sep 2025 16:46:21 +0200
From: David Hildenbrand <david@...hat.com>
To: Usama Arif <usamaarif642@...il.com>, linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Nico Pache <npache@...hat.com>,
Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
Barry Song <baohua@...nel.org>
Subject: Re: [PATCH v1] mm/huge_memory: fix shrinking of all-zero THPs with
max_ptes_none default
[...]
>
> The reason I did this is for the case if you change max_ptes_none after the THP is added
> to deferred split list but *before* memory pressure, i.e. before the shrinker runs,
> so that its considered for splitting.
Yeah, I was assuming that was the reason why the shrinker is enabled as
default.
But in any sane system, the admin would enable the shrinker early. If
not, we can look into handling it differently.
>
>> Easy to reproduce:
>>
>> 1) Allocate some THPs filled with 0s
>>
>> <prog.c>
>> #include <string.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <sys/mman.h>
>>
>> const size_t size = 1024*1024*1024;
>>
>> int main(void)
>> {
>> size_t offs;
>> char *area;
>>
>> area = mmap(0, size, PROT_READ | PROT_WRITE,
>> MAP_ANON | MAP_PRIVATE, -1, 0);
>> if (area == MAP_FAILED) {
>> printf("mmap failed\n");
>> exit(-1);
>> }
>> madvise(area, size, MADV_HUGEPAGE);
>>
>> for (offs = 0; offs < size; offs += getpagesize())
>> area[offs] = 0;
>> pause();
>> }
>> <\prog.c>
>>
>> 2) Trigger the shrinker
>>
>> E.g., memory pressure through memhog
>>
>> 3) Observe that THPs are not getting reclaimed
>>
>> $ cat /proc/`pgrep prog`/smaps_rollup
>>
>> Would list ~1GiB of AnonHugePages. With this fix, they would get
>> reclaimed as expected.
>>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>> Cc: Zi Yan <ziy@...dia.com>
>> Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>
>> Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>
>> Cc: Nico Pache <npache@...hat.com>
>> Cc: Ryan Roberts <ryan.roberts@....com>
>> Cc: Dev Jain <dev.jain@....com>
>> Cc: Barry Song <baohua@...nel.org>
>> Cc: Usama Arif <usamaarif642@...il.com>
>> Signed-off-by: David Hildenbrand <david@...hat.com>
>> ---
>> mm/huge_memory.c | 3 ---
>> 1 file changed, 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 26cedfcd74189..aa3ed7a86435b 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4110,9 +4110,6 @@ static bool thp_underused(struct folio *folio)
>> void *kaddr;
>> int i;
>>
>> - if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
>> - return false;
>> -
>
> I do agree with your usecase, but I am really worried about the amount of
> work and cpu time the THP shrinker will consume when max_ptes_none is 511
> (I dont have any numbers to back up my worry :)), and its less likely that
> we will have these completely zeroed out THPs (again no numbers to back up
> this statement).
Then then shrinker shall be deactivated as default if that becomes a
problem.
Fortunately you documented the desired semantics:
"All THPs at fault and collapse time will be added to _deferred_list,
and will therefore be split under memory pressure if they are considered
"underused". A THP is underused if the number of zero-filled pages in
the THP is above max_ptes_none (see below)."
> We have the huge_zero_folio as well which is installed on read.
Yes, only if the huge zero folio is not available. Which will then also
get properly reclaimed.
--
Cheers
David / dhildenb
Powered by blists - more mailing lists