lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA1CXcDQeiMjVhxVjnCvBuTQLSBQh0ea7FJXg52ebNFDHfXm1g@mail.gmail.com>
Date: Fri, 18 Jul 2025 15:00:25 -0600
From: Nico Pache <npache@...hat.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, 
	david@...hat.com, ziy@...dia.com, lorenzo.stoakes@...cle.com, 
	Liam.Howlett@...cle.com, ryan.roberts@....com, dev.jain@....com, 
	corbet@....net, rostedt@...dmis.org, mhiramat@...nel.org, 
	mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org, baohua@...nel.org, 
	willy@...radead.org, peterx@...hat.com, wangkefeng.wang@...wei.com, 
	usamaarif642@...il.com, sunnanyong@...wei.com, vishal.moola@...il.com, 
	thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com, 
	kirill.shutemov@...ux.intel.com, aarcange@...hat.com, raquini@...hat.com, 
	anshuman.khandual@....com, catalin.marinas@....com, tiwai@...e.de, 
	will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org, 
	jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com, hannes@...xchg.org, 
	rientjes@...gle.com, mhocko@...e.com, rdunlap@...radead.org, hughd@...gle.com
Subject: Re: [PATCH v9 13/14] khugepaged: add per-order mTHP khugepaged stats

On Thu, Jul 17, 2025 at 11:05 PM Baolin Wang
<baolin.wang@...ux.alibaba.com> wrote:
>
>
>
> On 2025/7/14 08:32, Nico Pache wrote:
> > With mTHP support inplace, let add the per-order mTHP stats for
> > exceeding NONE, SWAP, and SHARED.
> >
> > Signed-off-by: Nico Pache <npache@...hat.com>
> > ---
> >   Documentation/admin-guide/mm/transhuge.rst | 17 +++++++++++++++++
> >   include/linux/huge_mm.h                    |  3 +++
> >   mm/huge_memory.c                           |  7 +++++++
> >   mm/khugepaged.c                            | 15 ++++++++++++---
> >   4 files changed, 39 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 2c523dce6bc7..28c8af61efba 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -658,6 +658,23 @@ nr_anon_partially_mapped
> >          an anonymous THP as "partially mapped" and count it here, even though it
> >          is not actually partially mapped anymore.
> >
> > +collapse_exceed_swap_pte
> > +       The number of anonymous THP which contain at least one swap PTE.
> > +       Currently khugepaged does not support collapsing mTHP regions that
> > +       contain a swap PTE.
> > +
> > +collapse_exceed_none_pte
> > +       The number of anonymous THP which have exceeded the none PTE threshold.
> > +       With mTHP collapse, a bitmap is used to gather the state of a PMD region
> > +       and is then recursively checked from largest to smallest order against
> > +       the scaled max_ptes_none count. This counter indicates that the next
> > +       enabled order will be checked.
> > +
> > +collapse_exceed_shared_pte
> > +       The number of anonymous THP which contain at least one shared PTE.
> > +       Currently khugepaged does not support collapsing mTHP regions that
> > +       contain a shared PTE.
> > +
> >   As the system ages, allocating huge pages may be expensive as the
> >   system uses memory compaction to copy data around memory to free a
> >   huge page for use. There are some counters in ``/proc/vmstat`` to help
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 4042078e8cc9..e0a27f80f390 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -141,6 +141,9 @@ enum mthp_stat_item {
> >       MTHP_STAT_SPLIT_DEFERRED,
> >       MTHP_STAT_NR_ANON,
> >       MTHP_STAT_NR_ANON_PARTIALLY_MAPPED,
> > +     MTHP_STAT_COLLAPSE_EXCEED_SWAP,
> > +     MTHP_STAT_COLLAPSE_EXCEED_NONE,
> > +     MTHP_STAT_COLLAPSE_EXCEED_SHARED,
> >       __MTHP_STAT_COUNT
> >   };
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index e2ed9493df77..57e5699cf638 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -632,6 +632,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
> >   DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
> >   DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON);
> >   DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED);
> > +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_SWAP);
> > +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_NONE);
> > +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEED_SHARED);
> > +
> >
> >   static struct attribute *anon_stats_attrs[] = {
> >       &anon_fault_alloc_attr.attr,
> > @@ -648,6 +652,9 @@ static struct attribute *anon_stats_attrs[] = {
> >       &split_deferred_attr.attr,
> >       &nr_anon_attr.attr,
> >       &nr_anon_partially_mapped_attr.attr,
> > +     &collapse_exceed_swap_pte_attr.attr,
> > +     &collapse_exceed_none_pte_attr.attr,
> > +     &collapse_exceed_shared_pte_attr.attr,
> >       NULL,
> >   };
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index d0c99b86b304..8a5873d0a23a 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -594,7 +594,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >                               continue;
> >                       } else {
> >                               result = SCAN_EXCEED_NONE_PTE;
> > -                             count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > +                             if (order == HPAGE_PMD_ORDER)
> > +                                     count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > +                             else
> > +                                     count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE);
>
> Please follow the same logic as other mTHP statistics, meaning there is
> no need to filter out PMD-sized orders, because mTHP also supports
> PMD-sized orders. So logic should be:
>
> if (order == HPAGE_PMD_ORDER)
>         count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>
> count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE);
Good point-- I will fix that!
>
> >                               goto out;
> >                       }
> >               }
> > @@ -623,8 +626,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >               /* See khugepaged_scan_pmd(). */
> >               if (folio_maybe_mapped_shared(folio)) {
> >                       ++shared;
> > -                     if (order != HPAGE_PMD_ORDER || (cc->is_khugepaged &&
> > -                         shared > khugepaged_max_ptes_shared)) {
> > +                     if (order != HPAGE_PMD_ORDER) {
> > +                             result = SCAN_EXCEED_SHARED_PTE;
> > +                             count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED);
> > +                             goto out;
> > +                     }
>
> Ditto.
Thanks!

There is also the SWAP one, which is slightly different as it is
calculated during the scan phase, and in the mTHP case in the swapin
faulting code. Not sure if during the scan phase we should also
increment the counter for the PMD order... or just leave it as a
general vm_event counter since it's not attributed to an order during
scan. I believe the latter is the correct approach and only attribute
an order to it in the __collapse_huge_page_swapin function if its mTHP
collapses.
>
> > +
> > +                     if (cc->is_khugepaged &&
> > +                             shared > khugepaged_max_ptes_shared) {
> >                               result = SCAN_EXCEED_SHARED_PTE;
> >                               count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> >                               goto out;
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ