linux-kernel - Re: [RFC PATCH 6/7] mm: memory: add mTHP support for wp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <aJ9QJ2V0WQ5XneJx@vernon-pc>
Date: Fri, 15 Aug 2025 23:20:55 +0800
From: Vernon Yang <vernon2gm@...il.com>
To: David Hildenbrand <david@...hat.com>
Cc: akpm@...ux-foundation.org, lorenzo.stoakes@...cle.com, ziy@...dia.com,
	baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com,
	npache@...hat.com, ryan.roberts@....com, dev.jain@....com,
	baohua@...nel.org, glider@...gle.com, elver@...gle.com,
	dvyukov@...gle.com, vbabka@...e.cz, rppt@...nel.org,
	surenb@...gle.com, mhocko@...e.com, muchun.song@...ux.dev,
	osalvador@...e.de, shuah@...nel.org, richardcochran@...il.com,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 6/7] mm: memory: add mTHP support for wp

On Thu, Aug 14, 2025 at 01:58:34PM +0200, David Hildenbrand wrote:
> On 14.08.25 13:38, Vernon Yang wrote:
> > Currently pagefaults on anonymous pages support mthp, and hardware
> > features (such as arm64 contpte) can be used to store multiple ptes in
> > one TLB entry, reducing the probability of TLB misses. However, when the
> > process is forked and the cow is triggered again, the above optimization
> > effect is lost, and only 4KB is requested once at a time.
> >
> > Therefore, make pagefault write-protect copy support mthp to maintain the
> > optimization effect of TLB and improve the efficiency of cow pagefault.
> >
> > vm-scalability usemem shows a great improvement,
> > test using: usemem -n 32 --prealloc --prefault 249062617
> > (result unit is KB/s, bigger is better)
> >
> > |    size     | w/o patch | w/ patch  |  delta  |
> > |-------------|-----------|-----------|---------|
> > | baseline 4K | 723041.63 | 717643.21 | -0.75%  |
> > | mthp 16K    | 732871.14 | 799513.18 | +9.09%  |
> > | mthp 32K    | 746060.91 | 836261.83 | +12.09% |
> > | mthp 64K    | 747333.18 | 855570.43 | +14.48% |
>
> You're missing two of the most important metrics: COW latency and memory
> waste.

OK, I will add the above two test later.

>
> Just imagine what happens if you have PMD-sized THP.
>
> I would suggest you explore why Redis used to recommend to disable THPs
> (hint: tail latency due to COW of way-too-large chunks before we do what we
> do today).

Thanks for the suggestion, I'm not very familiar with Redis indeed. Currently,
this series supports small granularity sizes, such as 16KB, and I will also
test redis-benchmark later to see the severity of tail latency.

>
> So staring at usemem micro-benchmark results is a bit misleading.
>
> As discussed in the past, I would actually suggest to
>
> a) Let khugepaged deal with fixing this up later, keeping CoW path
>    simpler and faster.
> b) If we really really have to do this during fault time, limit it to
>    some order (might even be have to be configurable).

This is a good way to add a similar shmem_enabled knob after if need.

>
> I really think we should keep CoW latency low and instead let khugepaged fix
> that up later. (Nico is working on mTHP collapse support)
>
> [are you handling having a mixture of PageAnonExclusive within a folio
> properly? Only staring at R/O PTEs is usually insufficient to determine
> whether you can COW or whether you must reuse].

There is no extra processing on PageAnonExclusive here, only judging by R/O PTEs,
thank you for pointing it out, and I will look into how to properly handle
this situation later.

>
> --
> Cheers
>
> David / dhildenb
>