lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1aa5818f-eb75-4aee-a866-9d2f81111056@redhat.com>
Date: Fri, 5 Sep 2025 16:46:21 +0200
From: David Hildenbrand <david@...hat.com>
To: Usama Arif <usamaarif642@...il.com>, linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Nico Pache <npache@...hat.com>,
 Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
 Barry Song <baohua@...nel.org>
Subject: Re: [PATCH v1] mm/huge_memory: fix shrinking of all-zero THPs with
 max_ptes_none default

[...]

> 
> The reason I did this is for the case if you change max_ptes_none after the THP is added
> to deferred split list but *before* memory pressure, i.e. before the shrinker runs,
> so that its considered for splitting.

Yeah, I was assuming that was the reason why the shrinker is enabled as 
default.

But in any sane system, the admin would enable the shrinker early. If 
not, we can look into handling it differently.

> 
>> Easy to reproduce:
>>
>> 1) Allocate some THPs filled with 0s
>>
>> <prog.c>
>>   #include <string.h>
>>   #include <stdio.h>
>>   #include <stdlib.h>
>>   #include <unistd.h>
>>   #include <sys/mman.h>
>>
>>   const size_t size = 1024*1024*1024;
>>
>>   int main(void)
>>   {
>>           size_t offs;
>>           char *area;
>>
>>           area = mmap(0, size, PROT_READ | PROT_WRITE,
>>                       MAP_ANON | MAP_PRIVATE, -1, 0);
>>           if (area == MAP_FAILED) {
>>                   printf("mmap failed\n");
>>                   exit(-1);
>>           }
>>           madvise(area, size, MADV_HUGEPAGE);
>>
>>           for (offs = 0; offs < size; offs += getpagesize())
>>                   area[offs] = 0;
>>           pause();
>>   }
>> <\prog.c>
>>
>> 2) Trigger the shrinker
>>
>> E.g., memory pressure through memhog
>>
>> 3) Observe that THPs are not getting reclaimed
>>
>> $ cat /proc/`pgrep prog`/smaps_rollup
>>
>> Would list ~1GiB of AnonHugePages. With this fix, they would get
>> reclaimed as expected.
>>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>> Cc: Zi Yan <ziy@...dia.com>
>> Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>
>> Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>
>> Cc: Nico Pache <npache@...hat.com>
>> Cc: Ryan Roberts <ryan.roberts@....com>
>> Cc: Dev Jain <dev.jain@....com>
>> Cc: Barry Song <baohua@...nel.org>
>> Cc: Usama Arif <usamaarif642@...il.com>
>> Signed-off-by: David Hildenbrand <david@...hat.com>
>> ---
>>   mm/huge_memory.c | 3 ---
>>   1 file changed, 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 26cedfcd74189..aa3ed7a86435b 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4110,9 +4110,6 @@ static bool thp_underused(struct folio *folio)
>>   	void *kaddr;
>>   	int i;
>>   
>> -	if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
>> -		return false;
>> -
> 
> I do agree with your usecase, but I am really worried about the amount of
> work and cpu time the THP shrinker will consume when max_ptes_none is 511
> (I dont have any numbers to back up my worry :)), and its less likely that
> we will have these completely zeroed out THPs (again no numbers to back up
> this statement).

Then then shrinker shall be deactivated as default if that becomes a 
problem.

Fortunately you documented the desired semantics:

"All THPs at fault and collapse time will be added to _deferred_list,
and will therefore be split under memory pressure if they are considered
"underused". A THP is underused if the number of zero-filled pages in
the THP is above max_ptes_none (see below)."

> We have the huge_zero_folio as well which is installed on read.

Yes, only if the huge zero folio is not available. Which will then also 
get properly reclaimed.

-- 
Cheers

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ