lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23d9f708-b1fd-4b10-b755-b7ef6aa683e8@redhat.com>
Date: Mon, 1 Jul 2024 10:56:23 +0200
From: David Hildenbrand <david@...hat.com>
To: Barry Song <baohua@...nel.org>, Ryan Roberts <ryan.roberts@....com>
Cc: Lance Yang <ioworker0@...il.com>, akpm@...ux-foundation.org,
 baolin.wang@...ux.alibaba.com, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org
Subject: Re: [PATCH 1/2] mm: add per-order mTHP split counters

On 30.06.24 11:48, Barry Song wrote:
> On Thu, Apr 25, 2024 at 3:41 AM Ryan Roberts <ryan.roberts@....com> wrote:
>>
>> + Barry
>>
>> On 24/04/2024 14:51, Lance Yang wrote:
>>> At present, the split counters in THP statistics no longer include
>>> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split
>>> counters to monitor the frequency of mTHP splits. This will assist
>>> developers in better analyzing and optimizing system performance.
>>>
>>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>>>          split_page
>>>          split_page_failed
>>>          deferred_split_page
>>>
>>> Signed-off-by: Lance Yang <ioworker0@...il.com>
>>> ---
>>>   include/linux/huge_mm.h |  3 +++
>>>   mm/huge_memory.c        | 14 ++++++++++++--
>>>   2 files changed, 15 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 56c7ea73090b..7b9c6590e1f7 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -272,6 +272,9 @@ enum mthp_stat_item {
>>>        MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
>>>        MTHP_STAT_ANON_SWPOUT,
>>>        MTHP_STAT_ANON_SWPOUT_FALLBACK,
>>> +     MTHP_STAT_SPLIT_PAGE,
>>> +     MTHP_STAT_SPLIT_PAGE_FAILED,
>>> +     MTHP_STAT_DEFERRED_SPLIT_PAGE,
>>>        __MTHP_STAT_COUNT
>>>   };
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 055df5aac7c3..52db888e47a6 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
>>>   DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>>>   DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
>>>   DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
>>> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE);
>>> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED);
>>> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_PAGE);
>>>
>>>   static struct attribute *stats_attrs[] = {
>>>        &anon_fault_alloc_attr.attr,
>>> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = {
>>>        &anon_fault_fallback_charge_attr.attr,
>>>        &anon_swpout_attr.attr,
>>>        &anon_swpout_fallback_attr.attr,
>>> +     &split_page_attr.attr,
>>> +     &split_page_failed_attr.attr,
>>> +     &deferred_split_page_attr.attr,
>>>        NULL,
>>>   };
>>>
>>> @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>>        XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
>>>        struct anon_vma *anon_vma = NULL;
>>>        struct address_space *mapping = NULL;
>>> -     bool is_thp = folio_test_pmd_mappable(folio);
>>> +     int order = folio_order(folio);
>>>        int extra_pins, ret;
>>>        pgoff_t end;
>>>        bool is_hzp;
>>> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>>                i_mmap_unlock_read(mapping);
>>>   out:
>>>        xas_destroy(&xas);
>>> -     if (is_thp)
>>> +     if (order >= HPAGE_PMD_ORDER)
>>>                count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
>>> +     count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE :
>>> +                                   MTHP_STAT_SPLIT_PAGE_FAILED);
>>>        return ret;
>>>   }
>>>
>>> @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio)
>>>        if (list_empty(&folio->_deferred_list)) {
>>>                if (folio_test_pmd_mappable(folio))
>>>                        count_vm_event(THP_DEFERRED_SPLIT_PAGE);
>>> +             count_mthp_stat(folio_order(folio),
>>> +                             MTHP_STAT_DEFERRED_SPLIT_PAGE);
>>
>> There is a very long conversation with Barry about adding a 'global "mTHP became
>> partially mapped 1 or more processes" counter (inc only)', which terminates at
>> [1]. There is a lot of discussion about the required semantics around the need
>> for partial map to cover alignment and contiguity as well as whether all pages
>> are mapped, and to trigger once it becomes partial in at least 1 process.
>>
>> MTHP_STAT_DEFERRED_SPLIT_PAGE is giving much simpler semantics, but less
>> information as a result. Barry, what's your view here? I'm guessing this doesn't
>> quite solve what you are looking for?
> 
> This doesn't quite solve what I am looking for but I still think the
> patch has its value.
> 
> I'm looking for a solution that can:
> 
>    *  Count the amount of memory in the system for each mTHP size.
>    *  Determine how much memory for each mTHP size is partially unmapped.
> 
> For example, in a system with 16GB of memory, we might find that we have 3GB
> of 64KB mTHP, and within that, 512MB is partially unmapped, potentially wasting
> memory at this moment.  I'm uncertain whether Lance is interested in
> this job :-)
> 
> Counting deferred_split remains valuable as it can signal whether the system is
> experiencing significant partial unmapping.

I'll note that, especially without subpage mapcounts, in the future we 
won't have that information (how much is currently mapped) readily 
available in all cases. To obtain that information on demand, we'd have 
to scan page tables or walk the rmap.

Something to keep in mind: we don't want to introduce counters that will 
be expensive to maintain longterm.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ