[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260115205407.3050262-1-atomlin@atomlin.com>
Date: Thu, 15 Jan 2026 15:54:06 -0500
From: Aaron Tomlin <atomlin@...mlin.com>
To: oleg@...hat.com,
akpm@...ux-foundation.org,
gregkh@...uxfoundation.org,
david@...nel.org,
brauner@...nel.org,
mingo@...nel.org
Cc: neelx@...e.com,
sean@...e.io,
linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: [v3 PATCH 0/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status
Hi Oleg, David, Greg, Andrew,
This patch introduces a mechanism to expose the mm_cpumask of a process via
the /proc/[pid]/status interface.
In high-performance and large-scale NUMA environments, diagnosing latency
spikes attributed to Inter-Processor Interrupts (IPIs) can be particularly
challenging. While cpus_allowed describes where a thread may execute, it
does not describe the "memory footprint" - specifically, the set of CPUs
that may hold stale Translation Lookaside Buffer (TLB) entries for the
process.
It is this footprint (mm_cpumask) that dictates the target destination for
TLB flush IPIs. Discrepancies between a process's scheduling affinity and
its memory footprint are a common source of system noise and performance
degradation. By exposing this mask, we provide userspace with the
visibility required to debug these "invisible" sources of latency.
These fields are exposed only on architectures that explicitly opt-in
via CONFIG_ARCH_WANT_PROC_CPUS_ACTIVE_MM. This is necessary because
mm_cpumask semantics vary significantly across architectures; some
(e.g., x86) actively maintain the mask for coherency, while others may
never clear bits, rendering the data misleading for this specific use
case. x86 is updated to select this feature by default.
For example, outside x86:
# make fs/proc/array.i
# grep task_cpus_active_mm -B 1 -A 3 --max-count 1 fs/proc/array.i
# 430 "fs/proc/array.c"
static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) __attribute__((no_instrument_function)) void task_cpus_active_mm(struct seq_file *m, struct mm_struct *mm)
{
}
The implementation reads the mask directly without introducing additional
locks or snapshots. While this implies that the hex mask and list format
could theoretically observe slightly different states on a rapidly
changing system, this "best-effort" approach aligns with the standard
design philosophy of /proc and avoids imposing locking overhead on
critical memory management paths.
Changes since v2 [1]:
- Introduce new configuration ARCH_WANT_PROC_CPUS_ACTIVE_MM. The x86
architecture now explicitly selects this feature, ensuring that the
field is only exposed where the mm_cpumask semantics are meaningful for
TLB coherency (David Hildenbrand)
Changes since v1 [2]:
- Document new Cpus_active_mm and Cpus_active_mm_list entries in
/proc/[pid]/status (Oleg Nesterov)
[1]: https://lore.kernel.org/lkml/20251226211407.2252573-1-atomlin@atomlin.com/
[2]: https://lore.kernel.org/lkml/20251217024603.1846651-1-atomlin@atomlin.com/
Aaron Tomlin (1):
fs/proc: Expose mm_cpumask in /proc/[pid]/status
Documentation/filesystems/proc.rst | 7 +++++++
arch/x86/Kconfig | 1 +
fs/proc/Kconfig | 14 ++++++++++++++
fs/proc/array.c | 28 +++++++++++++++++++++++++++-
4 files changed, 49 insertions(+), 1 deletion(-)
--
2.51.0
Powered by blists - more mailing lists