linux-kernel - Re: [PATCH v2] mm: prohibit the last subpage from reusing the entire large folio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cfe7d2b5-eb41-4df0-bf6b-4ed4044e20ea@arm.com>
Date: Fri, 8 Mar 2024 12:50:08 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: David Hildenbrand <david@...hat.com>, Barry Song <21cnbao@...il.com>,
 akpm@...ux-foundation.org, linux-mm@...ck.org
Cc: minchan@...nel.org, fengwei.yin@...el.com, linux-kernel@...r.kernel.org,
 mhocko@...e.com, peterx@...hat.com, shy828301@...il.com,
 songmuchun@...edance.com, wangkefeng.wang@...wei.com, xiehuan09@...il.com,
 zokeefe@...gle.com, chrisl@...nel.org, yuzhao@...gle.com,
 Barry Song <v-songbaohua@...o.com>, Lance Yang <ioworker0@...il.com>
Subject: Re: [PATCH v2] mm: prohibit the last subpage from reusing the entire
 large folio

On 08/03/2024 09:34, David Hildenbrand wrote:
> On 08.03.24 10:27, Barry Song wrote:
>> From: Barry Song <v-songbaohua@...o.com>
>>
>> In a Copy-on-Write (CoW) scenario, the last subpage will reuse the entire
>> large folio, resulting in the waste of (nr_pages - 1) pages. This wasted
>> memory remains allocated until it is either unmapped or memory
>> reclamation occurs.
>>
>> The following small program can serve as evidence of this behavior
>>
>>   main()
>>   {
>>   #define SIZE 1024 * 1024 * 1024UL
>>           void *p = malloc(SIZE);
>>           memset(p, 0x11, SIZE);
>>           if (fork() == 0)
>>                   _exit(0);
>>           memset(p, 0x12, SIZE);
>>           printf("done\n");
>>           while(1);
>>   }
>>
>> For example, using a 1024KiB mTHP by:
>>   echo always > /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled
>>
>> (1) w/o the patch, it takes 2GiB,
>>
>> Before running the test program,
>>   / # free -m
>>                  total        used        free      shared  buff/cache  
>> available
>>   Mem:            5754          84        5692           0          17       
>> 5669
>>   Swap:              0           0           0
>>
>>   / # /a.out &
>>   / # done
>>
>> After running the test program,
>>   / # free -m
>>                   total        used        free      shared  buff/cache  
>> available
>>   Mem:            5754        2149        3627           0          19       
>> 3605
>>   Swap:              0           0           0
>>
>> (2) w/ the patch, it takes 1GiB only,
>>
>> Before running the test program,
>>   / # free -m
>>                   total        used        free      shared  buff/cache  
>> available
>>   Mem:            5754          89        5687           0          17       
>> 5664
>>   Swap:              0           0           0
>>
>>   / # /a.out &
>>   / # done
>>
>> After running the test program,
>>   / # free -m
>>                  total        used        free      shared  buff/cache  
>> available
>>   Mem:            5754        1122        4655           0          17       
>> 4632
>>   Swap:              0           0           0
>>
>> This patch migrates the last subpage to a small folio and immediately
>> returns the large folio to the system. It benefits both memory availability
>> and anti-fragmentation.
> 
> It might be controversial optimization, and as Ryan said, there, are likely
> other cases where we'd want to migrate off-of a thp if possible earlier.

Personally, I think there might also be cases where you want to copy/reuse the
entire large folio. If you're application is using 16K THPs perhaps it's a
bigger win to just treat it like a base page? I expect the cost/benefit will
change as the THP size increases?

I know we have previously talked about using a khugepaged-like mechanism to
re-collapse after CoW, but for the smaller sizes maybe that's just a lot more
effort?

> 
> But I like that it just handles large folios now in a consistent way for the
> time being.

Yes agreed.

> 
> Acked-by: David Hildenbrand <david@...hat.com>
>