[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <C7A07691-E8D8-436F-AEED-8825608880CE@nvidia.com>
Date: Wed, 30 Apr 2025 16:15:12 -0400
From: Zi Yan <ziy@...dia.com>
To: Nico Pache <npache@...hat.com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
akpm@...ux-foundation.org, corbet@....net, rostedt@...dmis.org,
mhiramat@...nel.org, mathieu.desnoyers@...icios.com, david@...hat.com,
baohua@...nel.org, baolin.wang@...ux.alibaba.com, ryan.roberts@....com,
willy@...radead.org, peterx@...hat.com, shuah@...nel.org,
wangkefeng.wang@...wei.com, usamaarif642@...il.com, sunnanyong@...wei.com,
vishal.moola@...il.com, thomas.hellstrom@...ux.intel.com,
yang@...amperecomputing.com, kirill.shutemov@...ux.intel.com,
aarcange@...hat.com, raquini@...hat.com, dev.jain@....com,
anshuman.khandual@....com, catalin.marinas@....com, tiwai@...e.de,
will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org,
jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com,
Liam.Howlett@...cle.com, lorenzo.stoakes@...cle.com, hannes@...xchg.org,
rientjes@...gle.com, mhocko@...e.com, rdunlap@...radead.org,
Bagas Sanjaya <bagasdotme@...il.com>
Subject: Re: [PATCH v5 2/4] mm: document (m)THP defer usage
On 28 Apr 2025, at 14:29, Nico Pache wrote:
> The new defer option for (m)THPs allows for a more conservative
> approach to (m)THPs. Document its usage in the transhuge admin-guide.
>
> Reviewed-by: Bagas Sanjaya <bagasdotme@...il.com>
> Signed-off-by: Nico Pache <npache@...hat.com>
> ---
> Documentation/admin-guide/mm/transhuge.rst | 31 ++++++++++++++++------
> 1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 5c63fe51b3ad..c50253357793 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -88,8 +88,9 @@ In certain cases when hugepages are enabled system wide, application
> may end up allocating more memory resources. An application may mmap a
> large region but only touch 1 byte of it, in that case a 2M page might
> be allocated instead of a 4k page for no good. This is why it's
> -possible to disable hugepages system-wide and to only have them inside
> -MADV_HUGEPAGE madvise regions.
> +possible to disable hugepages system-wide, only have them inside
> +MADV_HUGEPAGE madvise regions, or defer them away from the page fault
> +handler to khugepaged.
>
> Embedded systems should enable hugepages only inside madvise regions
> to eliminate any risk of wasting any precious byte of memory and to
> @@ -99,6 +100,15 @@ Applications that gets a lot of benefit from hugepages and that don't
> risk to lose memory by using hugepages, should use
> madvise(MADV_HUGEPAGE) on their critical mmapped regions.
>
> +Applications that would like to benefit from THPs but would still like a
> +more memory conservative approach can choose 'defer'. This avoids
> +inserting THPs at the page fault handler unless they are MADV_HUGEPAGE.
> +Khugepaged will then scan the mappings for potential collapses into (m)THP
How about the text below? It explicitly states khugepaged behavior.
Khugepaged will then scan all mappings, even those not explicitly marked
with MADV_HUGEPAGE, for potential collapses into (m)THPs.
> +pages. Admins using this the 'defer' setting should consider
> +tweaking khugepaged/max_ptes_none. The current default of 511 may
> +aggressively collapse your PTEs into PMDs. Lower this value to conserve
> +more memory (i.e., max_ptes_none=64).
> +
> .. _thp_sysfs:
>
> sysfs
> @@ -109,11 +119,14 @@ Global THP controls
>
> Transparent Hugepage Support for anonymous memory can be entirely disabled
> (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
> -regions (to avoid the risk of consuming more memory resources) or enabled
> -system wide. This can be achieved per-supported-THP-size with one of::
> +regions (to avoid the risk of consuming more memory resources), deferred to
> +khugepaged, or enabled system wide.
> +
> +This can be achieved per-supported-THP-size with one of::
>
> echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
> echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
> + echo defer >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
> echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>
> where <size> is the hugepage size being addressed, the available sizes
> @@ -136,6 +149,7 @@ The top-level setting (for use with "inherit") can be set by issuing
> one of the following commands::
>
> echo always >/sys/kernel/mm/transparent_hugepage/enabled
> + echo defer >/sys/kernel/mm/transparent_hugepage/enabled
> echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
> echo never >/sys/kernel/mm/transparent_hugepage/enabled
>
> @@ -286,7 +300,8 @@ of small pages into one large page::
> A higher value leads to use additional memory for programs.
> A lower value leads to gain less thp performance. Value of
> max_ptes_none can waste cpu time very little, you can
> -ignore it.
> +ignore it. Consider lowering this value when using
> +``transparent_hugepage=defer``
>
> ``max_ptes_swap`` specifies how many pages can be brought in from
> swap when collapsing a group of pages into a transparent huge page::
> @@ -311,14 +326,14 @@ Boot parameters
>
> You can change the sysfs boot time default for the top-level "enabled"
> control by passing the parameter ``transparent_hugepage=always`` or
> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> -kernel command line.
> +``transparent_hugepage=madvise`` or ``transparent_hugepage=defer`` or
> +``transparent_hugepage=never`` to the kernel command line.
>
> Alternatively, each supported anonymous THP size can be controlled by
> passing ``thp_anon=<size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>``,
> where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
> supported anonymous THP) and ``<state>`` is one of ``always``, ``madvise``,
> -``never`` or ``inherit``.
> +``defer``, ``never`` or ``inherit``.
>
> For example, the following will set 16K, 32K, 64K THP to ``always``,
> set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
Otherwise, LGTM. Thanks. Reviewed-by: Zi Yan <ziy@...dia.com>
--
Best Regards,
Yan, Zi
Powered by blists - more mailing lists