linux-kernel - Re: [PATCH 1/2] mm: Enable khugepaged to operate on non-writable VMAs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250903131121.eq32tcuswgiossgi@master>
Date: Wed, 3 Sep 2025 13:11:21 +0000
From: Wei Yang <richard.weiyang@...il.com>
To: Dev Jain <dev.jain@....com>
Cc: Wei Yang <richard.weiyang@...il.com>, akpm@...ux-foundation.org,
	david@...hat.com, kas@...nel.org, willy@...radead.org,
	hughd@...gle.com, ziy@...dia.com, baolin.wang@...ux.alibaba.com,
	lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com,
	npache@...hat.com, ryan.roberts@....com, baohua@...nel.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm: Enable khugepaged to operate on non-writable VMAs

On Wed, Sep 03, 2025 at 02:45:28PM +0530, Dev Jain wrote:
>
>On 03/09/25 1:38 pm, Wei Yang wrote:
>> On Wed, Sep 03, 2025 at 11:16:34AM +0530, Dev Jain wrote:
>> > Currently khugepaged does not collapse a region which does not have a
>> > single writable page. This is wasteful since non-writable VMAs mapped by
>> > the application won't benefit from THP collapse. Therefore, remove this
>> > restriction and allow khugepaged to collapse a VMA with arbitrary
>> > protections.
>> > 
>> > Along with this, currently MADV_COLLAPSE does not perform a collapse on a
>> > non-writable VMA, and this restriction is nowhere to be found on the
>> > manpage - the restriction itself sounds wrong to me since the user knows
>> > the protection of the memory it has mapped, so collapsing read-only
>> > memory via madvise() should be a choice of the user which shouldn't
>> > be overriden by the kernel.
>> > 
>> > On an arm64 machine, an average of 5% improvement is seen on some mmtests
>> > benchmarks, particularly hackbench, with a maximum improvement of 12%.
>> > 
>> > Signed-off-by: Dev Jain <dev.jain@....com>
>> > ---
>> [...]
>> > mm/khugepaged.c | 9 ++-------
>> > 1 file changed, 2 insertions(+), 7 deletions(-)
>> > 
>> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> > index 4ec324a4c1fe..a0f1df2a7ae6 100644
>> > --- a/mm/khugepaged.c
>> > +++ b/mm/khugepaged.c
>> > @@ -676,9 +676,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>> > 			writable = true;
>> > 	}
>> > 
>> > -	if (unlikely(!writable)) {
>> > -		result = SCAN_PAGE_RO;
>> > -	} else if (unlikely(cc->is_khugepaged && !referenced)) {
>> Would this cause more memory usage in system?
>> 
>> For example, one application would fork itself many times. It executable area
>> is read only, so all of them share one copy in memory.
>> 
>> Now we may collapse the range and create one copy for each process.
>> 
>> Ok, we have max_ptes_shared, while if some ptes are none, could it still do
>> collapse?
>> 
>> Maybe this is not realistic, just curious.
>
>Misunderstood your concern - you mean to say that a parent forks and the children
>VMAs are read-only pointing to the pages which were mapped by parent. Hmm.
>

This is one of the case in my mind, while what I described above is file
backed VMA.

Since pages are mapped both in parent and child, we would count shared ptes
during scan. So max_ptes_shared would decide whether to collapse or not.

To play with max_ptes_shared, this is a magic to me... Probably, there is no
optimal value for all scenario. And if it do gain much performance after
collapse, maybe it is the application author's responsibility to use hugetlb?

-- 
Wei Yang
Help you, Help me