[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250703140109.GW1613376@noisy.programming.kicks-ass.net>
Date: Thu, 3 Jul 2025 16:01:09 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: "Chen, Yu C" <yu.c.chen@...el.com>
Cc: Michal Hocko <mhocko@...e.com>, Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Tim Chen <tim.c.chen@...el.com>, linux-kernel@...r.kernel.org,
Jirka Hladky <jhladky@...hat.com>,
Srikanth Aithal <Srikanth.Aithal@....com>,
Suneeth D <Suneeth.D@....com>, Libo Chen <libo.chen@...cle.com>
Subject: Re: [PATCH] sched/numa: Fix NULL pointer access to mm_struct durng
task swap
On Thu, Jul 03, 2025 at 09:38:08PM +0800, Chen, Yu C wrote:
> Hi Peter,
>
> On 7/3/2025 8:36 PM, Peter Zijlstra wrote:
> > On Thu, Jul 03, 2025 at 05:20:47AM -0700, Libo Chen wrote:
> >
> > > I agree. The other parts, schedstat and vmstat, are still quite helpful.
> > > Also tracepoints are more expensive than counters once enabled, I think
> > > that's too much for just counting numbers.
> >
> > I'm not generally a fan of eBPF, but supposedly it is really good for
> > stuff like this.
> >
> > Attaching to a tracepoint and distributing into cgroup buckets seems
> > like it should be a trivial script.
>
> Yes, it is feasible to use eBPF. On the other hand, if some
> existing monitoring programs rely on /proc/{pid}/sched to observe
> the NUMA balancing metrics of processes, it might be helpful to
> include the NUMA migration/swap information in /proc/{pid}/sched.
> This approach can minimize the modifications needed for these
> monitoring programs, eliminating the need to add a new BPF script
> to obtain NUMA balancing statistics from different sources IMHO.
Maybe...
The thing is, most of the time the effort spend on collecting all these
numbers is wasted energy since nobody ever looks at them.
Sometimes we're stuck with ABI, like the proc files you mentioned. We
can't readily remove them, stuff would break. But does that mean we
should endlessly add to them just because convenient?
Ideally I would strip out all the statistics and accounting crap and
make sure we have tracepoints (not trace-events) covering all the needed
spots, and then maybe just maybe have a few kernel modules that hook
into those tracepoints to provide the legacy interfaces.
That way, only the people that care get to pay the overhead of actually
collecting the numbers.
One can dream I suppose... :-)
Powered by blists - more mailing lists