[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <zkl42ttlzuyidy2ner5sjfbg5b62l5mcmlcmardd534y2p2u2q@vz2w4nbwvbhf>
Date: Thu, 15 Jan 2026 20:53:48 -0500
From: Aaron Tomlin <atomlin@...mlin.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: "David Hildenbrand (Red Hat)" <david@...nel.org>, oleg@...hat.com,
akpm@...ux-foundation.org, gregkh@...uxfoundation.org, brauner@...nel.org, mingo@...nel.org,
neelx@...e.com, sean@...e.io, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>, Peter Zijlstra <peterz@...radead.org>, riel@...riel.com,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [v3 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status
On Thu, Jan 15, 2026 at 01:39:27PM -0800, Dave Hansen wrote:
> I don't think this is the kind of thing we want to expose as ABI. It's
> too deep of an implementation detail. Any meaning derived from it could
> also change on a whim.
>
> For instance, we've changed the rules about when CPUs are put in or
> taken out of mm_cpumask() over time. I think the rules might have even
> depended on the idle driver that your system was using at one time. I
> think Rik also just changed some rules around it in his INVLPGB patches.
>
> I'm not denying how valuable this kind of information might be. I just
> don't think it's generally useful enough to justify an ABI that we need
> to maintain forever. Tracing seems like a much more appropriate way to
> get the data you are after than new ABI.
>
> Can you get the info that you're after with kprobes? Or new tracepoints?
Hi Dave and Peter,
I fully appreciate your concern regarding the exposure of deep
implementation details as stable ABI. I understand that the semantics of
mm_cpumask are fluid. I certainly do not wish to ossify internal logic.
While the static tracepoint trace_tlb_flush is available, the primary
argument for exposing this via /proc/[pid]/status is one of immediacy and
the lack of external dependencies. Having an instantaneous snapshot
available without requiring e.g., Ftrace or eBPF, is invaluable for quick
diagnostic checks in production environments.
Based on my reading of arch/x86/mm/tlb.c, the lifecycle of each bit in
mm_cpumask appears to follow this logic:
1. Schedule on (switch_mm): Bit set.
2. Schedule off: Bit remains set (CPU enters "Lazy" mode).
3. Remote TLB Flush (IPI):
- If Running: Flush TLB, bit remains set.
- If lazy (leave_mm): Switch to init_mm, bit clearing is deferred.
- If stale (mm != loaded_mm): bit is cleared immediately
(effectively the second IPI for a CPU that was previously lazy).
Would you be amenable to this exposure if it were guarded behind a specific
CONFIG_DEBUG option (e.g., CONFIG_DEBUG_MM_CPUMASK_INFO)? This would
clearly mark it as a diagnostic aid for debugging, allowing educated users
to opt-in to the visibility without implying a permanent guarantee of
semantic stability for general userspace applications.
Please let me know your thoughts. Thank you.
Kind regards,
--
Aaron Tomlin
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists