lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bCwi0yU_jX8qKCBMUnTeqoDYc65z7GKd5uEKcpkPAn4MA@mail.gmail.com>
Date: Tue, 19 Mar 2024 10:25:30 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: akpm@...ux-foundation.org
Cc: Sourav Panda <souravpanda@...gle.com>, corbet@....net, gregkh@...uxfoundation.org, 
	rafael@...nel.org, mike.kravetz@...cle.com, muchun.song@...ux.dev, 
	rppt@...nel.org, david@...hat.com, rdunlap@...radead.org, 
	chenlinxuan@...ontech.com, yang.yang29@....com.cn, tomas.mudrunka@...il.com, 
	bhelgaas@...gle.com, ivan@...udflare.com, yosryahmed@...gle.com, 
	hannes@...xchg.org, shakeelb@...gle.com, kirill.shutemov@...ux.intel.com, 
	wangkefeng.wang@...wei.com, adobriyan@...il.com, vbabka@...e.cz, 
	Liam.Howlett@...cle.com, surenb@...gle.com, linux-kernel@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org, linux-mm@...ck.org, 
	willy@...radead.org, weixugc@...gle.com
Subject: Re: [PATCH v9 1/1] mm: report per-page metadata information

On Wed, Mar 13, 2024 at 6:40 PM Pasha Tatashin
<pasha.tatashin@...een.com> wrote:
>
> On Tue, Feb 20, 2024 at 4:46 PM Sourav Panda <souravpanda@...gle.com> wrote:
> >
> > Adds two new per-node fields, namely nr_memmap and nr_memmap_boot,
> > to /sys/devices/system/node/nodeN/vmstat and a global Memmap field
> > to /proc/meminfo. This information can be used by users to see how
> > much memory is being used by per-page metadata, which can vary
> > depending on build configuration, machine architecture, and system
> > use.
> >
> > Per-page metadata is the amount of memory that Linux needs in order to
> > manage memory at the page granularity. The majority of such memory is
> > used by "struct page" and "page_ext" data structures. In contrast to
> > most other memory consumption statistics, per-page metadata might not
> > be included in MemTotal. For example, MemTotal does not include memblock
> > allocations but includes buddy allocations. In this patch, exported
> > field nr_memmap in /sys/devices/system/node/nodeN/vmstat would
> > exclusively track buddy allocations while nr_memmap_boot would
> > exclusively track memblock allocations. Furthermore, Memmap in
> > /proc/meminfo would exclusively track buddy allocations allowing it to
> > be compared against MemTotal.
> >
> > This memory depends on build configurations, machine architectures, and
> > the way system is used:
> >
> > Build configuration may include extra fields into "struct page",
> > and enable / disable "page_ext"
> > Machine architecture defines base page sizes. For example 4K x86,
> > 8K SPARC, 64K ARM64 (optionally), etc. The per-page metadata
> > overhead is smaller on machines with larger page sizes.
> > System use can change per-page overhead by using vmemmap
> > optimizations with hugetlb pages, and emulated pmem devdax pages.
> > Also, boot parameters can determine whether page_ext is needed
> > to be allocated. This memory can be part of MemTotal or be outside
> > MemTotal depending on whether the memory was hot-plugged, booted with,
> > or hugetlb memory was returned back to the system.
> >
> > Utility for userspace:
> >
> > Application Optimization: Depending on the kernel version and command
> > line options, the kernel would relinquish a different number of pages
> > (that contain struct pages) when a hugetlb page is reserved (e.g., 0, 6
> > or 7 for a 2MB hugepage). The userspace application would want to know
> > the exact savings achieved through page metadata deallocation without
> > dealing with the intricacies of the kernel.
> >
> > Observability: Struct page overhead can only be calculated on-paper at
> > boot time (e.g., 1.5% machine capacity). Beyond boot once hugepages are
> > reserved or memory is hotplugged, the computation becomes complex.
> > Per-page metrics will help explain part of the system memory overhead,
> > which shall help guide memory optimizations and memory cgroup sizing.
> >
> > Debugging: Tracking the changes or absolute value in struct pages can
> > help detect anomalies as they can be correlated with other metrics in
> > the machine (e.g., memtotal, number of huge pages, etc).
> >
> > page_ext overheads: Some kernel features such as page_owner
> > page_table_check that use page_ext can be optionally enabled via kernel
> > parameters. Having the total per-page metadata information helps users
> > precisely measure impact.

Hi Andrew,

Can you please give this patch another look, does it require more
reviews before you can take it in?

Thank you,
Pasha

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ