[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7d537df-7899-42fc-b9ef-66733105abbe@redhat.com>
Date: Thu, 22 Aug 2024 12:01:52 +0200
From: David Hildenbrand <david@...hat.com>
To: Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
baolin.wang@...ux.alibaba.com, chrisl@...nel.org, hanchuanhua@...o.com,
ioworker0@...il.com, kaleshsingh@...gle.com, kasong@...cent.com,
linux-kernel@...r.kernel.org, ryan.roberts@....com, v-songbaohua@...o.com,
ziy@...dia.com, yuanshuai@...o.com
Subject: Re: [PATCH v2 1/2] mm: collect the number of anon large folios
On 22.08.24 11:21, Barry Song wrote:
> On Thu, Aug 22, 2024 at 8:59 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> On 22.08.24 10:44, Barry Song wrote:
>>> On Thu, Aug 22, 2024 at 12:52 PM Barry Song <21cnbao@...il.com> wrote:
>>>>
>>>> On Thu, Aug 22, 2024 at 5:34 AM David Hildenbrand <david@...hat.com> wrote:
>>>>>
>>>>> On 12.08.24 00:49, Barry Song wrote:
>>>>>> From: Barry Song <v-songbaohua@...o.com>
>>>>>>
>>>>>> Anon large folios come from three places:
>>>>>> 1. new allocated large folios in PF, they will call folio_add_new_anon_rmap()
>>>>>> for rmap;
>>>>>> 2. a large folio is split into multiple lower-order large folios;
>>>>>> 3. a large folio is migrated to a new large folio.
>>>>>>
>>>>>> In all above three counts, we increase nr_anon by 1;
>>>>>>
>>>>>> Anon large folios might go either because of be split or be put
>>>>>> to free, in these cases, we reduce the count by 1.
>>>>>>
>>>>>> Folios that have been added to the swap cache but have not yet received
>>>>>> an anon mapping won't be counted. This is consistent with the AnonPages
>>>>>> statistics in /proc/meminfo.
>>>>>
>>>>> Thinking out loud, I wonder if we want to have something like that for
>>>>> any anon folios (including small ones).
>>>>>
>>>>> Assume we longterm-pinned an anon folio and unmapped/zapped it. It would
>>>>> be quite interesting to see that these are actually anon pages still
>>>>> consuming memory. Same with memory leaks, when an anon folio doesn't get
>>>>> freed for some reason.
>>>>>
>>>>> The whole "AnonPages" counter thingy is just confusing, it only counts
>>>>> what's currently mapped ... so we'd want something different.
>>>>>
>>>>> But it's okay to start with large folios only, there we have a new
>>>>> interface without that legacy stuff :)
>>>>
>>>> We have two options to do this:
>>>> 1. add a new separate nr_anon_unmapped interface which
>>>> counts unmapped anon memory only
>>>> 2. let the nr_anon count both mapped and unmapped anon
>>>> folios.
>>>>
>>>> I would assume 1 is clearer as right now AnonPages have been
>>>> there for years. and counting all mapped and unmapped together,
>>>> we are still lacking an approach to find out anon memory leak
>>>> problem you mentioned.
>>>>
>>>> We are right now comparing nr_anon(including mapped folios only)
>>>> with AnonPages to get the distribution of different folio sizes in
>>>> performance profiling.
>>>>
>>>> unmapped_nr_anon should be normally always quite small. otherwise,
>>>> something must be wrong.
>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Barry Song <v-songbaohua@...o.com>
>>>>>> ---
>>>>>> Documentation/admin-guide/mm/transhuge.rst | 5 +++++
>>>>>> include/linux/huge_mm.h | 15 +++++++++++++--
>>>>>> mm/huge_memory.c | 13 ++++++++++---
>>>>>> mm/migrate.c | 4 ++++
>>>>>> mm/page_alloc.c | 5 ++++-
>>>>>> mm/rmap.c | 1 +
>>>>>> 6 files changed, 37 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>>>>>> index 058485daf186..9fdfb46e4560 100644
>>>>>> --- a/Documentation/admin-guide/mm/transhuge.rst
>>>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>>>>>> @@ -527,6 +527,11 @@ split_deferred
>>>>>> it would free up some memory. Pages on split queue are going to
>>>>>> be split under memory pressure, if splitting is possible.
>>>>>>
>>>>>> +nr_anon
>>>>>> + the number of anon huge pages we have in the whole system.
>>>>>
>>>>> "transparent ..." otherwise people might confuse it with anon hugetlb
>>>>> "huge pages" ... :)
>>>>>
>>>>> I briefly tried coming up with a better name than "nr_anon" but failed.
>>>>>
>>>>>
>>>>
>>>> if we might have unmapped_anon counter later, maybe rename it to
>>>> nr_anon_mapped? and the new interface we will have in the future
>>>> might be nr_anon_unmapped?
>>
>> We really shouldn't be using the mapped/unmapped terminology here ... we
>> allocated pages and turned them into anonymous folios. At some point we
>> free them. That's the lifecycle.
>>
>>>
>>> On second thought, this might be incorrect as well. Concepts like 'anon',
>>> 'shmem', and 'file' refer to states after mapping. If an 'anon' has been
>>> unmapped but is still pinned and not yet freed, it isn't technically an
>>> 'anon' anymore?
>>
>> It's just not mapped, and cannot get mapped, anymore. In the memdesc
>> world, we'd be freeing the "struct anon" or "struct folio" once the last
>> refcount goes to 0, not once (e.g., temporarily during a failed
>> migration?) unmapped.
>>
>> The important part to me would be: this is memory that was allocated for
>> anonymous memory, and it's still around for some reason and not getting
>> freed. Usually, we would expect anon memory to get freed fairly quickly
>> once unmapped. Except when there are long-term pinnings or other types
>> of memory leaks.
>>
>> You could happily continue using these anon pages via vmsplice() or
>> similar, even thought he original page table mapping was torn down.
>>
>>>
>>> On the other hand, implementing nr_anon_unmapped could be extremely
>>> tricky. I have no idea how to implement it as we are losing those mapping
>>> flags.
>>
>> folio_mapcount() can tell you efficiently whether a folio is mapped or
>> not -- and that information will stay for all eternity as long as we
>> have any mapcounts :) . It cannot tell "how many" pages of a large folio
>> are mapped, but at least "is any page of this large folio mapped".
>
> Exactly. AnonPages decreases by -1 when removed from the rmap,
> whereas nr_anon decreases by -1 when an anon folio is freed. So,
> I would assume nr_anon includes those pinned and unmapped anon
> folios but AnonPages doesn't.
Right, note how internally it is called "NR_ANON_MAPPED", but we ended
up calling it "AnonPages". But that's rather a legacy interface we
cannot change (fix) that easily. At least not without a config option.
At some point it might indeed be interesting to have "nr_anon_mapped",
here, but that would correspond to "is any part of this large folio
mapped". For debugging purposes in the future, that might be indeed
interesting.
"nr_anon": anon allocations (until freed -> +1)
"nr_anon_mapped": anon allocations that are mapped (any part mapped -> +1)
"nr_anon_partially_mapped": anon allocations that was detected to be
partially mapped at some point -> +1
If a folio is in the swapcache, I would still want to see that it is an
anon allocation lurking around in the system. Like we do with pagecache
pages. *There* we do have the difference between "allocated" and
"mapped" already.
So likely, calling it "nr_anon" here, and tracking it on an allocation
level, is good enough for now and future proof.
>
> If there's a significant amount of 'leaked' anon, we should consider
> having a separate counter for them. For instance, if nr_anon is
> 100,000 and pinned/unmapped pages account for 50%, then nr_anon
> alone doesn’t effectively reflect the system's state.
Right, but if you stare at the system you could tell that a significant
amount of memory is still getting consumed through existing/previous
anon mappings. Depends on how valuable that distinction really is.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists