[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bAM4Xe7BT3TFZT-+3qQTFGgkYBiYY=oVkdqMN8gyJg_0g@mail.gmail.com>
Date: Wed, 13 Mar 2024 18:40:03 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Sourav Panda <souravpanda@...gle.com>
Cc: corbet@....net, gregkh@...uxfoundation.org, rafael@...nel.org,
akpm@...ux-foundation.org, mike.kravetz@...cle.com, muchun.song@...ux.dev,
rppt@...nel.org, david@...hat.com, rdunlap@...radead.org,
chenlinxuan@...ontech.com, yang.yang29@....com.cn, tomas.mudrunka@...il.com,
bhelgaas@...gle.com, ivan@...udflare.com, yosryahmed@...gle.com,
hannes@...xchg.org, shakeelb@...gle.com, kirill.shutemov@...ux.intel.com,
wangkefeng.wang@...wei.com, adobriyan@...il.com, vbabka@...e.cz,
Liam.Howlett@...cle.com, surenb@...gle.com, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org, linux-mm@...ck.org,
willy@...radead.org, weixugc@...gle.com
Subject: Re: [PATCH v9 1/1] mm: report per-page metadata information
On Tue, Feb 20, 2024 at 4:46 PM Sourav Panda <souravpanda@...gle.com> wrote:
>
> Adds two new per-node fields, namely nr_memmap and nr_memmap_boot,
> to /sys/devices/system/node/nodeN/vmstat and a global Memmap field
> to /proc/meminfo. This information can be used by users to see how
> much memory is being used by per-page metadata, which can vary
> depending on build configuration, machine architecture, and system
> use.
>
> Per-page metadata is the amount of memory that Linux needs in order to
> manage memory at the page granularity. The majority of such memory is
> used by "struct page" and "page_ext" data structures. In contrast to
> most other memory consumption statistics, per-page metadata might not
> be included in MemTotal. For example, MemTotal does not include memblock
> allocations but includes buddy allocations. In this patch, exported
> field nr_memmap in /sys/devices/system/node/nodeN/vmstat would
> exclusively track buddy allocations while nr_memmap_boot would
> exclusively track memblock allocations. Furthermore, Memmap in
> /proc/meminfo would exclusively track buddy allocations allowing it to
> be compared against MemTotal.
>
> This memory depends on build configurations, machine architectures, and
> the way system is used:
>
> Build configuration may include extra fields into "struct page",
> and enable / disable "page_ext"
> Machine architecture defines base page sizes. For example 4K x86,
> 8K SPARC, 64K ARM64 (optionally), etc. The per-page metadata
> overhead is smaller on machines with larger page sizes.
> System use can change per-page overhead by using vmemmap
> optimizations with hugetlb pages, and emulated pmem devdax pages.
> Also, boot parameters can determine whether page_ext is needed
> to be allocated. This memory can be part of MemTotal or be outside
> MemTotal depending on whether the memory was hot-plugged, booted with,
> or hugetlb memory was returned back to the system.
>
> Utility for userspace:
>
> Application Optimization: Depending on the kernel version and command
> line options, the kernel would relinquish a different number of pages
> (that contain struct pages) when a hugetlb page is reserved (e.g., 0, 6
> or 7 for a 2MB hugepage). The userspace application would want to know
> the exact savings achieved through page metadata deallocation without
> dealing with the intricacies of the kernel.
>
> Observability: Struct page overhead can only be calculated on-paper at
> boot time (e.g., 1.5% machine capacity). Beyond boot once hugepages are
> reserved or memory is hotplugged, the computation becomes complex.
> Per-page metrics will help explain part of the system memory overhead,
> which shall help guide memory optimizations and memory cgroup sizing.
>
> Debugging: Tracking the changes or absolute value in struct pages can
> help detect anomalies as they can be correlated with other metrics in
> the machine (e.g., memtotal, number of huge pages, etc).
>
> page_ext overheads: Some kernel features such as page_owner
> page_table_check that use page_ext can be optionally enabled via kernel
> parameters. Having the total per-page metadata information helps users
> precisely measure impact.
>
> Suggested-by: Pasha Tatashin <pasha.tatashin@...een.com>
> Signed-off-by: Sourav Panda <souravpanda@...gle.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@...een.com>
Powered by blists - more mailing lists