lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 31 May 2024 11:35:30 +0200
From: David Hildenbrand <david@...hat.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
 hughd@...gle.com
Cc: willy@...radead.org, wangkefeng.wang@...wei.com, ying.huang@...el.com,
 21cnbao@...il.com, ryan.roberts@....com, shy828301@...il.com,
 ziy@...dia.com, ioworker0@...il.com, da.gomez@...sung.com,
 p.raghav@...sung.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 0/6] add mTHP support for anonymous shmem

On 30.05.24 04:04, Baolin Wang wrote:
> Anonymous pages have already been supported for multi-size (mTHP) allocation
> through commit 19eaf44954df, that can allow THP to be configured through the
> sysfs interface located at '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
> 
> However, the anonymous shmem will ignore the anonymous mTHP rule configured
> through the sysfs interface, and can only use the PMD-mapped THP, that is not
> reasonable. Many implement anonymous page sharing through mmap(MAP_SHARED |
> MAP_ANONYMOUS), especially in database usage scenarios, therefore, users expect
> to apply an unified mTHP strategy for anonymous pages, also including the
> anonymous shared pages, in order to enjoy the benefits of mTHP. For example,
> lower latency than PMD-mapped THP, smaller memory bloat than PMD-mapped THP,
> contiguous PTEs on ARM architecture to reduce TLB miss etc.
> 
> The primary strategy is similar to supporting anonymous mTHP. Introduce
> a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
> which can have all the same values as the top-level
> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
> additional "inherit" option. By default all sizes will be set to "never"
> except PMD size, which is set to "inherit". This ensures backward compatibility
> with the anonymous shmem enabled of the top level, meanwhile also allows
> independent control of anonymous shmem enabled for each mTHP.
> 
> Use the page fault latency tool to measure the performance of 1G anonymous shmem
> with 32 threads on my machine environment with: ARM64 Architecture, 32 cores,
> 125G memory:
> base: mm-unstable
> user-time    sys_time    faults_per_sec_per_cpu     faults_per_sec
> 0.04s        3.10s         83516.416                  2669684.890
> 
> mm-unstable + patchset, anon shmem mTHP disabled
> user-time    sys_time    faults_per_sec_per_cpu     faults_per_sec
> 0.02s        3.14s         82936.359                  2630746.027
> 
> mm-unstable + patchset, anon shmem 64K mTHP enabled
> user-time    sys_time    faults_per_sec_per_cpu     faults_per_sec
> 0.08s        0.31s         678630.231                 17082522.495
> 
>  From the data above, it is observed that the patchset has a minimal impact when
> mTHP is not enabled (some fluctuations observed during testing). When enabling 64K
> mTHP, there is a significant improvement of the page fault latency.

Let me summarize the takeaway from the bi-weekly MM meeting as I 
understood it, that includes Hugh's feedback on per-block tracking vs. mTHP:

(1) Per-block tracking

Per-block tracking is currently considered unwarranted complexity in 
shmem.c. We should try to get it done without that. For any test cases 
that fail, we should consider if they are actually valid for shmem.

To optimize FALLOC_FL_PUNCH_HOLE for the cases where splitting+freeing
is not possible at fallcoate() time, detecting zeropages later and
retrying to split+free might be an option, without per-block tracking.

(2) mTHP controls

As a default, we should not be using large folios / mTHP for any shmem, 
just like we did with THP via shmem_enabled. This is what this series 
currently does, and is aprt of the whole mTHP user-space interface design.

Further, the mTHP controls should control all of shmem, not only 
"anonymous shmem".

Also, we should properly fallback within the configured sizes, and not 
jump "over" configured sizes. Unless there is a good reason.

(3) khugepaged

khugepaged needs to handle larger folios properly as well. Until fixed, 
using smaller THP sizes as fallback might prohibit collapsing a 
PMD-sized THP later. But really, khugepaged needs to be fixed to handle 
that.

(4) force/disable

These settings are rather testing artifacts from the old ages. We should 
not add them to the per-size toggles. We might "inherit" it from the 
global one, though.

"within_size" might have value, and especially for consistency, we 
should have them per size.



So, this series only tackles anonymous shmem, which is a good starting 
point. Ideally, we'd get support for other shmem (especially during 
fault time) soon afterwards, because we won't be adding separate toggles 
for that from the interface POV, and having inconsistent behavior 
between kernel versions would be a bit unfortunate.


@Baolin, this series likely does not consider (4) yet. And I suggest we 
have to take a lot of the "anonymous thp" terminology out of this 
series, especially when it comes to documentation.

@Daniel, Pankaj, what are your plans regarding that? It would be great 
if we could get an understanding on the next steps on !anon shmem.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ