[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e29bbb0d-8c61-49b8-9bf0-df7ddf2728e7@redhat.com>
Date: Wed, 29 Nov 2023 09:05:31 +0100
From: David Hildenbrand <david@...hat.com>
To: John Hubbard <jhubbard@...dia.com>,
Ryan Roberts <ryan.roberts@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Matthew Wilcox <willy@...radead.org>,
Yin Fengwei <fengwei.yin@...el.com>,
Yu Zhao <yuzhao@...gle.com>,
Catalin Marinas <catalin.marinas@....com>,
Anshuman Khandual <anshuman.khandual@....com>,
Yang Shi <shy828301@...il.com>,
"Huang, Ying" <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
Luis Chamberlain <mcgrof@...nel.org>,
Itaru Kitayama <itaru.kitayama@...il.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Hugh Dickins <hughd@...gle.com>,
Kefeng Wang <wangkefeng.wang@...wei.com>
Cc: linux-mm@...ck.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [RESEND PATCH v7 03/10] mm: thp: Introduce per-size thp sysfs
interface
On 29.11.23 04:42, John Hubbard wrote:
> On 11/22/23 08:29, Ryan Roberts wrote:
>> In preparation for adding support for anonymous small-sized THP,
>> introduce new sysfs structure that will be used to control the new
>> behaviours. A new directory is added under transparent_hugepage for each
>> supported THP size, and contains an `enabled` file, which can be set to
>> "global" (to inherrit the global setting), "always", "madvise" or
>> "never". For now, the kernel still only supports PMD-sized anonymous
>> THP, so only 1 directory is populated.
>>
>> The first half of the change converts transhuge_vma_suitable() and
>> hugepage_vma_check() so that they take a bitfield of orders for which
>> the user wants to determine support, and the functions filter out all
>> the orders that can't be supported, given the current sysfs
>> configuration and the VMA dimensions. If there is only 1 order set in
>> the input then the output can continue to be treated like a boolean;
>> this is the case for most call sites.
>>
>> The second half of the change implements the new sysfs interface. It has
>> been done so that each supported THP size has a `struct thpsize`, which
>> describes the relevant metadata and is itself a kobject. This is pretty
>> minimal for now, but should make it easy to add new per-thpsize files to
>> the interface if needed in future (e.g. per-size defrag). Rather than
>> keep the `enabled` state directly in the struct thpsize, I've elected to
>> directly encode it into huge_anon_orders_[always|madvise|global]
>> bitfields since this reduces the amount of work required in
>> transhuge_vma_suitable() which is called for every page fault.
>>
>> The remainder is copied from Documentation/admin-guide/mm/transhuge.rst,
>> as modified by this commit. See that file for further details.
>>
>> Transparent Hugepage Support for anonymous memory can be entirely
>> disabled (mostly for debugging purposes) or only enabled inside
>> MADV_HUGEPAGE regions (to avoid the risk of consuming more memory
>> resources) or enabled system wide. This can be achieved
>> per-supported-THP-size with one of::
>>
>> echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>>
>> where <size> is the hugepage size being addressed, the available sizes
>> for which vary by system. Alternatively it is possible to specify that
>> a given hugepage size will inherrit the global enabled setting::
>>
>> echo global >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>>
>> The global (legacy) enabled setting can be set as follows::
>>
>> echo always >/sys/kernel/mm/transparent_hugepage/enabled
>> echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
>> echo never >/sys/kernel/mm/transparent_hugepage/enabled
>>
>> By default, PMD-sized hugepages have enabled="global" and all other
>> hugepage sizes have enabled="never". If enabling multiple hugepage
>> sizes, the kernel will select the most appropriate enabled size for a
>> given allocation.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
>> ---
>> Documentation/admin-guide/mm/transhuge.rst | 74 ++++--
>> Documentation/filesystems/proc.rst | 6 +-
>> fs/proc/task_mmu.c | 3 +-
>> include/linux/huge_mm.h | 100 +++++---
>> mm/huge_memory.c | 263 +++++++++++++++++++--
>> mm/khugepaged.c | 16 +-
>> mm/memory.c | 6 +-
>> mm/page_vma_mapped.c | 3 +-
>> 8 files changed, 387 insertions(+), 84 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>> index b0cc8243e093..52565e0bd074 100644
>> --- a/Documentation/admin-guide/mm/transhuge.rst
>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>> @@ -45,10 +45,23 @@ components:
>> the two is using hugepages just because of the fact the TLB miss is
>> going to run faster.
>>
>> +As well as PMD-sized THP described above, it is also possible to
>> +configure the system to allocate "small-sized THP" to back anonymous
>
> Here's one of the places to change to the new name, which lately is
> "multi-size THP", or mTHP or m_thp for short. (I've typed "multi-size"
> instead of "multi-sized", because the 'd' doesn't add significantly to
> the meaning, and if in doubt, shorter is better.
>
>
>> +memory (for example 16K, 32K, 64K, etc). These THPs continue to be
>> +PTE-mapped, but in many cases can still provide similar benefits to
>> +those outlined above: Page faults are significantly reduced (by a
>> +factor of e.g. 4, 8, 16, etc), but latency spikes are much less
>> +prominent because the size of each page isn't as huge as the PMD-sized
>> +variant and there is less memory to clear in each page fault. Some
>> +architectures also employ TLB compression mechanisms to squeeze more
>> +entries in when a set of PTEs are virtually and physically contiguous
>> +and approporiately aligned. In this case, TLB misses will occur less
>> +often.
>> +
>
> OK, all of the above still seems like it can remain the same.
>
>> THP can be enabled system wide or restricted to certain tasks or even
>> memory ranges inside task's address space. Unless THP is completely
>> disabled, there is ``khugepaged`` daemon that scans memory and
>> -collapses sequences of basic pages into huge pages.
>> +collapses sequences of basic pages into PMD-sized huge pages.
>>
>> The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
>> interface and using madvise(2) and prctl(2) system calls.
>> @@ -95,12 +108,29 @@ Global THP controls
>> Transparent Hugepage Support for anonymous memory can be entirely disabled
>> (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
>> regions (to avoid the risk of consuming more memory resources) or enabled
>> -system wide. This can be achieved with one of::
>> +system wide. This can be achieved per-supported-THP-size with one of::
>> +
>> + echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> + echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> + echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +
>> +where <size> is the hugepage size being addressed, the available sizes
>> +for which vary by system. Alternatively it is possible to specify that
>> +a given hugepage size will inherrit the global enabled setting::
>
> typo: inherrit
>
>> +
>> + echo global >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +
>> +The global (legacy) enabled setting can be set as follows::
>>
>> echo always >/sys/kernel/mm/transparent_hugepage/enabled
>> echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
>> echo never >/sys/kernel/mm/transparent_hugepage/enabled
>>
>> +By default, PMD-sized hugepages have enabled="global" and all other
>> +hugepage sizes have enabled="never". If enabling multiple hugepage
>> +sizes, the kernel will select the most appropriate enabled size for a
>> +given allocation.
>> +
>
> This is slightly murky. I wonder if "inherited" is a little more directly
> informative than global; it certainly felt that way my first time running
> this and poking at it.
>
> And a few trivial examples would be a nice touch.
>
> And so overall with a few other minor tweaks, I'd suggest this:
>
> ...
> where <size> is the hugepage size being addressed, the available sizes
> for which vary by system.
>
> For example:
> echo always >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
>
> Alternatively it is possible to specify that a given hugepage size will inherit
> the top-level "enabled" value:
>
> echo inherited >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>
> For example:
> echo inherited >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
>
> The top-level setting (for use with "inherited") can be by issuing one of the
> following commands::
>
> echo always >/sys/kernel/mm/transparent_hugepage/enabled
> echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
> echo never >/sys/kernel/mm/transparent_hugepage/enabled
>
> By default, PMD-sized hugepages have enabled="inherited" and all other
> hugepage sizes have enabled="never".
"inherited" works for me as well.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists