linux-kernel - Re: [PATCH] alloc_tag: add per-NUMA node stats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250612053605.5911-1-00107082@163.com>
Date: Thu, 12 Jun 2025 13:36:05 +0800
From: David Wang <00107082@....com>
To: cachen@...estorage.com
Cc: akpm@...ux-foundation.org,
	cl@...two.org,
	corbet@....net,
	dennis@...nel.org,
	hannes@...xchg.org,
	harry.yoo@...cle.com,
	jackmanb@...gle.com,
	kent.overstreet@...ux.dev,
	linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	mhocko@...e.com,
	rientjes@...gle.com,
	roman.gushchin@...ux.dev,
	surenb@...gle.com,
	tj@...nel.org,
	vbabka@...e.cz,
	yzhong@...estorage.com,
	ziy@...dia.com
Subject: Re: [PATCH] alloc_tag: add per-NUMA node stats

Hi, 

On Tue, 10 Jun 2025 17:30:53 -0600 Casey Chen <cachen@...estorage.com> wrote:
> Add support for tracking per-NUMA node statistics in /proc/allocinfo.
> Previously, each alloc_tag had a single set of counters (bytes and
> calls), aggregated across all CPUs. With this change, each CPU can
> maintain separate counters for each NUMA node, allowing finer-grained
> memory allocation profiling.
> 
> This feature is controlled by the new
> CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS option:
> 
> * When enabled (=y), the output includes per-node statistics following
>   the total bytes/calls:
> 
> <size> <calls> <tag info>
> ...
> 315456       9858     mm/dmapool.c:338 func:pool_alloc_page
>         nid0     94912        2966
>         nid1     220544       6892
> 7680         60       mm/dmapool.c:254 func:dma_pool_create
>         nid0     4224         33
>         nid1     3456         27
> 
> * When disabled (=n), the output remains unchanged:
> <size> <calls> <tag info>
> ...
> 315456       9858     mm/dmapool.c:338 func:pool_alloc_page
> 7680         60       mm/dmapool.c:254 func:dma_pool_create
> 
> To minimize memory overhead, per-NUMA stats counters are dynamically
> allocated using the percpu allocator. PERCPU_DYNAMIC_RESERVE has been
> increased to ensure sufficient space for in-kernel alloc_tag counters.
> 
> For in-kernel alloc_tag instances, pcpu_alloc_noprof() is used to
> allocate counters. These allocations are excluded from the profiling
> statistics themselves.

Considering NUMA balance, I have two questions:
1. Do we need the granularity of calling sites?
We need that granularity to identify a possible memory leak, or somewhere
we can optimize its memory usage.
But for NUMA unbalance, the calling site would mostly be *innocent*, the
clue normally lies in the cpu making memory allocation, memory interface, etc...
The point is, when NUMA unbalance happened, can it be fixed by adjusting the calling sites?
Isn't <cpu, memory interface/slab name, numa id> enough to be used as key for numa
stats analysis?
Any example explaining why you need the information of calling sites and how the
information of calling site help?

2. There are other factors affecting NUMA balance, cgroup usage, e.g...
Could we collect those information at calling sites as well.



Thanks
David