linux-kernel - Re: [PATCH 1/2] mm: add per-order mTHP split counters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK1f24=59+MRvnKLFww1seu==tEWg8FmJrEY5-Uaaf_kwUWWtg@mail.gmail.com>
Date: Mon, 1 Jul 2024 19:06:17 +0800
From: Lance Yang <ioworker0@...il.com>
To: David Hildenbrand <david@...hat.com>
Cc: Barry Song <baohua@...nel.org>, Ryan Roberts <ryan.roberts@....com>, akpm@...ux-foundation.org, 
	baolin.wang@...ux.alibaba.com, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org
Subject: Re: [PATCH 1/2] mm: add per-order mTHP split counters

Hi David,

On Mon, Jul 1, 2024 at 4:56 PM David Hildenbrand <david@...hat.com> wrote:
>
> On 30.06.24 11:48, Barry Song wrote:
> > On Thu, Apr 25, 2024 at 3:41 AM Ryan Roberts <ryan.roberts@....com> wrote:
> >>
> >> + Barry
> >>
> >> On 24/04/2024 14:51, Lance Yang wrote:
> >>> At present, the split counters in THP statistics no longer include
> >>> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split
> >>> counters to monitor the frequency of mTHP splits. This will assist
> >>> developers in better analyzing and optimizing system performance.
> >>>
> >>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> >>>          split_page
> >>>          split_page_failed
> >>>          deferred_split_page
> >>>
> >>> Signed-off-by: Lance Yang <ioworker0@...il.com>
> >>> ---
> >>>   include/linux/huge_mm.h |  3 +++
> >>>   mm/huge_memory.c        | 14 ++++++++++++--
> >>>   2 files changed, 15 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >>> index 56c7ea73090b..7b9c6590e1f7 100644
> >>> --- a/include/linux/huge_mm.h
> >>> +++ b/include/linux/huge_mm.h
> >>> @@ -272,6 +272,9 @@ enum mthp_stat_item {
> >>>        MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> >>>        MTHP_STAT_ANON_SWPOUT,
> >>>        MTHP_STAT_ANON_SWPOUT_FALLBACK,
> >>> +     MTHP_STAT_SPLIT_PAGE,
> >>> +     MTHP_STAT_SPLIT_PAGE_FAILED,
> >>> +     MTHP_STAT_DEFERRED_SPLIT_PAGE,
> >>>        __MTHP_STAT_COUNT
> >>>   };
> >>>
> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>> index 055df5aac7c3..52db888e47a6 100644
> >>> --- a/mm/huge_memory.c
> >>> +++ b/mm/huge_memory.c
> >>> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
> >>>   DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> >>>   DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> >>>   DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> >>> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE);
> >>> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED);
> >>> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_PAGE);
> >>>
> >>>   static struct attribute *stats_attrs[] = {
> >>>        &anon_fault_alloc_attr.attr,
> >>> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = {
> >>>        &anon_fault_fallback_charge_attr.attr,
> >>>        &anon_swpout_attr.attr,
> >>>        &anon_swpout_fallback_attr.attr,
> >>> +     &split_page_attr.attr,
> >>> +     &split_page_failed_attr.attr,
> >>> +     &deferred_split_page_attr.attr,
> >>>        NULL,
> >>>   };
> >>>
> >>> @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >>>        XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
> >>>        struct anon_vma *anon_vma = NULL;
> >>>        struct address_space *mapping = NULL;
> >>> -     bool is_thp = folio_test_pmd_mappable(folio);
> >>> +     int order = folio_order(folio);
> >>>        int extra_pins, ret;
> >>>        pgoff_t end;
> >>>        bool is_hzp;
> >>> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >>>                i_mmap_unlock_read(mapping);
> >>>   out:
> >>>        xas_destroy(&xas);
> >>> -     if (is_thp)
> >>> +     if (order >= HPAGE_PMD_ORDER)
> >>>                count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> >>> +     count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE :
> >>> +                                   MTHP_STAT_SPLIT_PAGE_FAILED);
> >>>        return ret;
> >>>   }
> >>>
> >>> @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio)
> >>>        if (list_empty(&folio->_deferred_list)) {
> >>>                if (folio_test_pmd_mappable(folio))
> >>>                        count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> >>> +             count_mthp_stat(folio_order(folio),
> >>> +                             MTHP_STAT_DEFERRED_SPLIT_PAGE);
> >>
> >> There is a very long conversation with Barry about adding a 'global "mTHP became
> >> partially mapped 1 or more processes" counter (inc only)', which terminates at
> >> [1]. There is a lot of discussion about the required semantics around the need
> >> for partial map to cover alignment and contiguity as well as whether all pages
> >> are mapped, and to trigger once it becomes partial in at least 1 process.
> >>
> >> MTHP_STAT_DEFERRED_SPLIT_PAGE is giving much simpler semantics, but less
> >> information as a result. Barry, what's your view here? I'm guessing this doesn't
> >> quite solve what you are looking for?
> >
> > This doesn't quite solve what I am looking for but I still think the
> > patch has its value.
> >
> > I'm looking for a solution that can:
> >
> >    *  Count the amount of memory in the system for each mTHP size.
> >    *  Determine how much memory for each mTHP size is partially unmapped.
> >
> > For example, in a system with 16GB of memory, we might find that we have 3GB
> > of 64KB mTHP, and within that, 512MB is partially unmapped, potentially wasting
> > memory at this moment.  I'm uncertain whether Lance is interested in
> > this job :-)
> >
> > Counting deferred_split remains valuable as it can signal whether the system is
> > experiencing significant partial unmapping.
>
> I'll note that, especially without subpage mapcounts, in the future we
> won't have that information (how much is currently mapped) readily
> available in all cases. To obtain that information on demand, we'd have
> to scan page tables or walk the rmap.

Thanks for pointing that out!

>
> Something to keep in mind: we don't want to introduce counters that will
> be expensive to maintain longterm.

I'll keep that in mind as we move forward with any new implementations.

Thanks,
Lance

>
> --
> Cheers,
>
> David / dhildenb
>