[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b71b28b9-1d41-4085-99f8-04d85892967e@redhat.com>
Date: Thu, 2 Nov 2023 17:58:25 +0100
From: David Hildenbrand <david@...hat.com>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: Wei Xu <weixugc@...gle.com>, Sourav Panda <souravpanda@...gle.com>,
corbet@....net, gregkh@...uxfoundation.org, rafael@...nel.org,
akpm@...ux-foundation.org, mike.kravetz@...cle.com,
muchun.song@...ux.dev, rppt@...nel.org, rdunlap@...radead.org,
chenlinxuan@...ontech.com, yang.yang29@....com.cn,
tomas.mudrunka@...il.com, bhelgaas@...gle.com, ivan@...udflare.com,
yosryahmed@...gle.com, hannes@...xchg.org, shakeelb@...gle.com,
kirill.shutemov@...ux.intel.com, wangkefeng.wang@...wei.com,
adobriyan@...il.com, vbabka@...e.cz, Liam.Howlett@...cle.com,
surenb@...gle.com, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org,
linux-mm@...ck.org, willy@...radead.org,
Greg Thelen <gthelen@...gle.com>
Subject: Re: [PATCH v5 1/1] mm: report per-page metadata information
On 02.11.23 17:43, Pasha Tatashin wrote:
> On Thu, Nov 2, 2023 at 12:09 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> On 02.11.23 17:02, Pasha Tatashin wrote:
>>> On Thu, Nov 2, 2023 at 11:53 AM David Hildenbrand <david@...hat.com> wrote:
>>>>
>>>> On 02.11.23 16:50, Pasha Tatashin wrote:
>>>>>>> Adding reserved memory to MemTotal is a cleaner approach IMO as well.
>>>>>>> But it changes the semantics of MemTotal, which may have compatibility
>>>>>>> issues.
>>>>>>
>>>>>> I object.
>>>>>
>>>>> Could you please elaborate what you object (and why): you object that
>>>>> it will have compatibility issues, or you object to include memblock
>>>>> reserves into MemTotal?
>>>>
>>>> Sorry, I object to changing the semantics of MemTotal. MemTotal is
>>>> traditionally the memory managed by the buddy, not all memory in the
>>>> system. I know people/scripts that are relying on that [although it's
>>>> been source of confusion a couple of times].
>>>
>>> What if one day we change so that struct pages are allocated from
>>> buddy allocator (i.e. allocate deferred struct pages from buddy) will
>>
>> It does on memory hotplug. But for things like crashkernel size
>> detection doesn't really care about that.
>
> "Crash kernel" is a different case: it is kernel external memory,
> similar to limiting the amount of physical memory via mem=/memmap=, it
> sets memory that cannot be used by this kernel, but only by the crash
> kernel. Also, the crash kernel reserve is exposed in /proc/iomem via
> "Crash kernel" range.
Agreed.
>
> Page metadata memory on the other hand, is used by this kernel, and
> also can be changed by this kernel depending on how the memory is
> used: memdec, hotplug, THP, emulated pmem etc.
And then, there is the "altmap" for dax, where the metadata is placed on
the dax memory itself. I mean, it's system RAM (or NVDIMM or whatever)
used for metadata, but not managed by the buddy.
There is now also the "memmap_on_memory" feature for memory hotplug,
where we do the same for ordinary hotplug memory (but some memory aside
for the memmap and not allocate it from the buddy). We'd have to account
that one as well as metadata, I think. I don't think it would get
accounted under MemTotal (because, not managed by the buddy) as of now.
>
>>> it break those MemTotal scripts? What if the size of struct pages
>>> changes significantly, but the overhead will come from other metadata
>>> (i.e. memdesc) will that break those scripts? I feel like struct page
>>
>> Probably; but ideally the metadata overhead will be smaller with
>> memdesc. And we'll talk about that once it gets real ;)
>
> The size and allocation of struct pages change MemTotal today, during
> runtime, even without memdesc, I just brought it up, to emphasize that
> this is something that we should resolve now before it gets worse.
I don't quite see the immediate need for action, but I get what you are
saying. It's a historical mess, but if we want to tackle it, we should
tackle it completely and not only sort out the metadata accounting.
>
>>> memory should really be included into MemTotal, otherwise we will have
>>> this struggle in the future when we try to optimize struct page
>>> memory.
>> How far do we want to go, do we want to include crashkernel reserved
>> memory in MemTotal because it is system memory? Only metadata? what else
>> allocated using memblock?
>>
>> Again, right now it's simple: MemTotal is memory managed by the buddy.
>>
>> The spirit of this patch set is good, modifying existing counters needs
>> good justification.
>
> Wei, noticed that all other fields in /proc/meminfo are part of
> MemTotal, but this new field may be not (depending where struct pages
I could have sworn that I pointed that out in a previous version and
requested to document that special case in the patch description. :)
> are allocated), so what would be the best way to export page metadata
> without redefining MemTotal? Keep the new field in /proc/meminfo but
> be ok that it is not part of MemTotal or do two counters? If we do two
> counters, we will still need to keep one that is a buddy allocator in
> /proc/meminfo and the other one somewhere outside?
IMHO, we should just leave MemTotal alone ("memory managed by the buddy
that could actually mostly get freed up and reused -- although that's
not completely true") and have a new counter that includes any system
memory (MemSystem? but as we learned, as separate files), including most
memblock allocations/reservations as well (metadata, early pagetables,
initrd, kernel, ...).
The you would actually know how much memory the system is using
(exclusing things like crashmem, mem=, ...).
That part is tricky, though -- I recall there are memblock reservations
that are similar to the crashkernel -- which is why the current state is
to account memory when it's handed to the buddy under MemTotal -- which
is straight forward and simply.
I'm happy to discuss this further, if that direction is worth exploring.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists