[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dfbaa342-632d-4911-a0c5-f1ffe32f9e57@redhat.com>
Date: Sat, 16 Aug 2025 08:40:37 +0200
From: David Hildenbrand <david@...hat.com>
To: Vernon Yang <vernon2gm@...il.com>
Cc: akpm@...ux-foundation.org, lorenzo.stoakes@...cle.com, ziy@...dia.com,
baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com, npache@...hat.com,
ryan.roberts@....com, dev.jain@....com, baohua@...nel.org,
glider@...gle.com, elver@...gle.com, dvyukov@...gle.com, vbabka@...e.cz,
rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, muchun.song@...ux.dev,
osalvador@...e.de, shuah@...nel.org, richardcochran@...il.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 6/7] mm: memory: add mTHP support for wp
On 15.08.25 17:20, Vernon Yang wrote:
> On Thu, Aug 14, 2025 at 01:58:34PM +0200, David Hildenbrand wrote:
>> On 14.08.25 13:38, Vernon Yang wrote:
>>> Currently pagefaults on anonymous pages support mthp, and hardware
>>> features (such as arm64 contpte) can be used to store multiple ptes in
>>> one TLB entry, reducing the probability of TLB misses. However, when the
>>> process is forked and the cow is triggered again, the above optimization
>>> effect is lost, and only 4KB is requested once at a time.
>>>
>>> Therefore, make pagefault write-protect copy support mthp to maintain the
>>> optimization effect of TLB and improve the efficiency of cow pagefault.
>>>
>>> vm-scalability usemem shows a great improvement,
>>> test using: usemem -n 32 --prealloc --prefault 249062617
>>> (result unit is KB/s, bigger is better)
>>>
>>> | size | w/o patch | w/ patch | delta |
>>> |-------------|-----------|-----------|---------|
>>> | baseline 4K | 723041.63 | 717643.21 | -0.75% |
>>> | mthp 16K | 732871.14 | 799513.18 | +9.09% |
>>> | mthp 32K | 746060.91 | 836261.83 | +12.09% |
>>> | mthp 64K | 747333.18 | 855570.43 | +14.48% |
>>
>> You're missing two of the most important metrics: COW latency and memory
>> waste.
>
> OK, I will add the above two test later.
>
>>
>> Just imagine what happens if you have PMD-sized THP.
>>
>> I would suggest you explore why Redis used to recommend to disable THPs
>> (hint: tail latency due to COW of way-too-large chunks before we do what we
>> do today).
>
> Thanks for the suggestion, I'm not very familiar with Redis indeed. Currently,
> this series supports small granularity sizes, such as 16KB, and I will also
> test redis-benchmark later to see the severity of tail latency.
>
>>
>> So staring at usemem micro-benchmark results is a bit misleading.
>>
>> As discussed in the past, I would actually suggest to
>>
>> a) Let khugepaged deal with fixing this up later, keeping CoW path
>> simpler and faster.
>> b) If we really really have to do this during fault time, limit it to
>> some order (might even be have to be configurable).
>
> This is a good way to add a similar shmem_enabled knob after if need.
>
>>
>> I really think we should keep CoW latency low and instead let khugepaged fix
>> that up later. (Nico is working on mTHP collapse support)
>>
>> [are you handling having a mixture of PageAnonExclusive within a folio
>> properly? Only staring at R/O PTEs is usually insufficient to determine
>> whether you can COW or whether you must reuse].
>
> There is no extra processing on PageAnonExclusive here, only judging by R/O PTEs,
> thank you for pointing it out, and I will look into how to properly handle
> this situation later.
Yes, but as I said: I much prefer to let khugepaged handle that. I am
not convinced the complexity here is warranted.
Nico's patches should soon be in shape to collapse mthp. (see the list)
--
Cheers
David / dhildenb
Powered by blists - more mailing lists