[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db3517d0-54b1-4d3a-b798-1c13572d07be@linux.alibaba.com>
Date: Fri, 31 May 2024 18:13:03 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: David Hildenbrand <david@...hat.com>, akpm@...ux-foundation.org,
hughd@...gle.com
Cc: willy@...radead.org, wangkefeng.wang@...wei.com, ying.huang@...el.com,
21cnbao@...il.com, ryan.roberts@....com, shy828301@...il.com,
ziy@...dia.com, ioworker0@...il.com, da.gomez@...sung.com,
p.raghav@...sung.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 0/6] add mTHP support for anonymous shmem
On 2024/5/31 17:35, David Hildenbrand wrote:
> On 30.05.24 04:04, Baolin Wang wrote:
>> Anonymous pages have already been supported for multi-size (mTHP)
>> allocation
>> through commit 19eaf44954df, that can allow THP to be configured
>> through the
>> sysfs interface located at
>> '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
>>
>> However, the anonymous shmem will ignore the anonymous mTHP rule
>> configured
>> through the sysfs interface, and can only use the PMD-mapped THP, that
>> is not
>> reasonable. Many implement anonymous page sharing through
>> mmap(MAP_SHARED |
>> MAP_ANONYMOUS), especially in database usage scenarios, therefore,
>> users expect
>> to apply an unified mTHP strategy for anonymous pages, also including the
>> anonymous shared pages, in order to enjoy the benefits of mTHP. For
>> example,
>> lower latency than PMD-mapped THP, smaller memory bloat than
>> PMD-mapped THP,
>> contiguous PTEs on ARM architecture to reduce TLB miss etc.
>>
>> The primary strategy is similar to supporting anonymous mTHP. Introduce
>> a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
>> which can have all the same values as the top-level
>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
>> additional "inherit" option. By default all sizes will be set to "never"
>> except PMD size, which is set to "inherit". This ensures backward
>> compatibility
>> with the anonymous shmem enabled of the top level, meanwhile also allows
>> independent control of anonymous shmem enabled for each mTHP.
>>
>> Use the page fault latency tool to measure the performance of 1G
>> anonymous shmem
>> with 32 threads on my machine environment with: ARM64 Architecture, 32
>> cores,
>> 125G memory:
>> base: mm-unstable
>> user-time sys_time faults_per_sec_per_cpu faults_per_sec
>> 0.04s 3.10s 83516.416 2669684.890
>>
>> mm-unstable + patchset, anon shmem mTHP disabled
>> user-time sys_time faults_per_sec_per_cpu faults_per_sec
>> 0.02s 3.14s 82936.359 2630746.027
>>
>> mm-unstable + patchset, anon shmem 64K mTHP enabled
>> user-time sys_time faults_per_sec_per_cpu faults_per_sec
>> 0.08s 0.31s 678630.231 17082522.495
>>
>> From the data above, it is observed that the patchset has a minimal
>> impact when
>> mTHP is not enabled (some fluctuations observed during testing). When
>> enabling 64K
>> mTHP, there is a significant improvement of the page fault latency.
>
> Let me summarize the takeaway from the bi-weekly MM meeting as I
> understood it, that includes Hugh's feedback on per-block tracking vs.
Thanks David for the summarization.
> mTHP:
>
> (1) Per-block tracking
>
> Per-block tracking is currently considered unwarranted complexity in
> shmem.c. We should try to get it done without that. For any test cases
> that fail, we should consider if they are actually valid for shmem.
>
> To optimize FALLOC_FL_PUNCH_HOLE for the cases where splitting+freeing
> is not possible at fallcoate() time, detecting zeropages later and
> retrying to split+free might be an option, without per-block tracking.
>
> (2) mTHP controls
>
> As a default, we should not be using large folios / mTHP for any shmem,
> just like we did with THP via shmem_enabled. This is what this series
> currently does, and is aprt of the whole mTHP user-space interface design.
>
> Further, the mTHP controls should control all of shmem, not only
> "anonymous shmem".
Yes, that's what I thought and in my TODO list.
>
> Also, we should properly fallback within the configured sizes, and not
> jump "over" configured sizes. Unless there is a good reason.
>
> (3) khugepaged
>
> khugepaged needs to handle larger folios properly as well. Until fixed,
> using smaller THP sizes as fallback might prohibit collapsing a
> PMD-sized THP later. But really, khugepaged needs to be fixed to handle
> that. >
> (4) force/disable
>
> These settings are rather testing artifacts from the old ages. We should
> not add them to the per-size toggles. We might "inherit" it from the
> global one, though.
Sorry, I missed this. So I thould remove the 'force' and 'deny' option
for each mTHP, right?
>
> "within_size" might have value, and especially for consistency, we
> should have them per size.
>
>
>
> So, this series only tackles anonymous shmem, which is a good starting
> point. Ideally, we'd get support for other shmem (especially during
> fault time) soon afterwards, because we won't be adding separate toggles
> for that from the interface POV, and having inconsistent behavior
> between kernel versions would be a bit unfortunate.
>
>
> @Baolin, this series likely does not consider (4) yet. And I suggest we
> have to take a lot of the "anonymous thp" terminology out of this
> series, especially when it comes to documentation.
Sure. I will remove the "anonymous thp" terminology from the
documentation, but want to still keep it in the commit message, cause I
want to start from the anonymous shmem.
>
> @Daniel, Pankaj, what are your plans regarding that? It would be great
> if we could get an understanding on the next steps on !anon shmem.
>
Powered by blists - more mailing lists