[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA1CXcBQ6G70Pg93XphsXAwwHtJPbFuJb=OmfwK2s3q3aevGuA@mail.gmail.com>
Date: Mon, 28 Apr 2025 08:54:47 -0600
From: Nico Pache <npache@...hat.com>
To: Usama Arif <usamaarif642@...il.com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
akpm@...ux-foundation.org, corbet@....net, rostedt@...dmis.org,
mhiramat@...nel.org, mathieu.desnoyers@...icios.com, david@...hat.com,
baohua@...nel.org, baolin.wang@...ux.alibaba.com, ryan.roberts@....com,
willy@...radead.org, peterx@...hat.com, ziy@...dia.com,
wangkefeng.wang@...wei.com, sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
kirill.shutemov@...ux.intel.com, aarcange@...hat.com, raquini@...hat.com,
dev.jain@....com, anshuman.khandual@....com, catalin.marinas@....com,
tiwai@...e.de, will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz,
cl@...two.org, jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com,
hannes@...xchg.org, rientjes@...gle.com, mhocko@...e.com,
rdunlap@...radead.org
Subject: Re: [PATCH v4 12/12] Documentation: mm: update the admin guide for
mTHP collapse
On Thu, Apr 24, 2025 at 9:04 AM Usama Arif <usamaarif642@...il.com> wrote:
>
>
>
> On 17/04/2025 01:02, Nico Pache wrote:
> > Now that we can collapse to mTHPs lets update the admin guide to
> > reflect these changes and provide proper guidence on how to utilize it.
> >
> > Signed-off-by: Nico Pache <npache@...hat.com>
> > ---
> > Documentation/admin-guide/mm/transhuge.rst | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index dff8d5985f0f..06814e05e1d5 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -63,7 +63,7 @@ often.
> > THP can be enabled system wide or restricted to certain tasks or even
> > memory ranges inside task's address space. Unless THP is completely
> > disabled, there is ``khugepaged`` daemon that scans memory and
> > -collapses sequences of basic pages into PMD-sized huge pages.
> > +collapses sequences of basic pages into huge pages.
> >
> > The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
> > interface and using madvise(2) and prctl(2) system calls.
> > @@ -144,6 +144,14 @@ hugepage sizes have enabled="never". If enabling multiple hugepage
> > sizes, the kernel will select the most appropriate enabled size for a
> > given allocation.
> >
> > +khugepaged uses max_ptes_none scaled to the order of the enabled mTHP size to
> > +determine collapses. When using mTHPs it's recommended to set max_ptes_none
> > +low-- ideally less than HPAGE_PMD_NR / 2 (255 on 4k page size). This will
> > +prevent undesired "creep" behavior that leads to continuously collapsing to a
> > +larger mTHP size. max_ptes_shared and max_ptes_swap have no effect when
> > +collapsing to a mTHP, and mTHP collapse will fail on shared or swapped out
> > +pages.
> > +
>
> Hi Nico,
>
> Could you add a bit more explanation of the creep behaviour here in documentation.
> I remember you explained in one of the earlier versions that if more than half of the
> collapsed mTHP is zero-filled, it for some reason becomes eligible for collapsing to
> larger order, but if less than half is zero-filled its not eligible? I cant exactly
> remember what the reason was :) Would be good to have it documented more if possible.
Hi Usama,
You can think of the creep as a byproduct of introducing N new
non-zero pages to a N sized mTHP, essentially doubling the size. On a
second pass of this mTHP the same condition would be eligible, leading
to constant promotion to the next size. If we allow khugepaged to
double the size of mTHP, by introducing non-zero pages, it will keep
doubling.
I'll see how I can incorporate this description into the admin guide.
-- Nico
>
> Thanks
>
> > It's also possible to limit defrag efforts in the VM to generate
> > anonymous hugepages in case they're not immediately free to madvise
> > regions or to never try to defrag memory and simply fallback to regular
>
Powered by blists - more mailing lists