[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260121163342.GI166857@noisy.programming.kicks-ass.net>
Date: Wed, 21 Jan 2026 17:33:42 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: "Chen, Yu C" <yu.c.chen@...el.com>
Cc: Swapnil Sapkal <swapnil.sapkal@....com>, ravi.bangoria@....com,
mark.rutland@....com, alexander.shishkin@...ux.intel.com,
jolsa@...nel.org, rostedt@...dmis.org, vincent.guittot@...aro.org,
adrian.hunter@...el.com, kan.liang@...ux.intel.com,
gautham.shenoy@....com, kprateek.nayak@....com,
juri.lelli@...hat.com, yangjihong@...edance.com, void@...ifault.com,
tj@...nel.org, sshegde@...ux.ibm.com, ctshao@...gle.com,
quic_zhonhan@...cinc.com, thomas.falcon@...el.com,
blakejones@...gle.com, ashelat@...hat.com, leo.yan@....com,
dvyukov@...gle.com, ak@...ux.intel.com, yujie.liu@...el.com,
graham.woodward@....com, ben.gainey@....com, vineethr@...ux.ibm.com,
tim.c.chen@...ux.intel.com, linux@...blig.org,
santosh.shukla@....com, sandipan.das@....com,
linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
mingo@...hat.com, namhyung@...nel.org, irogers@...gle.com,
james.clark@....com, acme@...nel.org
Subject: Re: [PATCH v5 00/10] perf sched: Introduce stats tool
On Thu, Jan 22, 2026 at 12:09:25AM +0800, Chen, Yu C wrote:
> On 1/20/2026 1:58 AM, Swapnil Sapkal wrote:
> > MOTIVATION
> > ----------
> >
> > Existing `perf sched` is quite exhaustive and provides lot of insights
> > into scheduler behavior but it quickly becomes impractical to use for
> > long running or scheduler intensive workload. For ex, `perf sched record`
> > has ~7.77% overhead on hackbench (with 25 groups each running 700K loops
> > on a 2-socket 128 Cores 256 Threads 3rd Generation EPYC Server), and it
> > generates huge 56G perf.data for which perf takes ~137 mins to prepare
> > and write it to disk [1].
> >
> > Unlike `perf sched record`, which hooks onto set of scheduler tracepoints
> > and generates samples on a tracepoint hit, `perf sched stats record` takes
> > snapshot of the /proc/schedstat file before and after the workload, i.e.
> > there is almost zero interference on workload run. Also, it takes very
> > minimal time to parse /proc/schedstat, convert it into perf samples and
> > save those samples into perf.data file. Result perf.data file is much
> > smaller. So, overall `perf sched stats record` is much more light weight
> > compare to `perf sched record`.
> >
> > We, internally at AMD, have been using this (a variant of this, known as
> > "sched-scoreboard"[2]) and found it to be very useful to analyse impact
> > of any scheduler code changes[3][4]. Prateek used v2[5] of this patch
> > series to report the analysis[6][7].
> >
> > Please note that, this is not a replacement of perf sched record/report.
> > The intended users of the new tool are scheduler developers, not regular
> > users.
> >
> > USAGE
> > -----
> >
> > # perf sched stats record
> > # perf sched stats report
> > # perf sched stats diff
> >
> > Note: Although `perf sched stats` tool supports workload profiling syntax
> > (i.e. -- <workload> ), the recorded profile is still systemwide since the
> > /proc/schedstat is a systemwide file.
> >
>
> I found this is useful for load balance analysis on my
> 384 CPUs system with 6.19.0-rc1, please feel free to add
>
> Tested-by: Chen Yu <yu.c.chen@...el.com>
Yeah, I've used a previous version for a while, was very nice.
Acked-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Powered by blists - more mailing lists