[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a434657-5b5b-41ec-a79a-c648c2829602@intel.com>
Date: Thu, 22 Jan 2026 00:09:25 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Swapnil Sapkal <swapnil.sapkal@....com>
CC: <ravi.bangoria@....com>, <mark.rutland@....com>,
<alexander.shishkin@...ux.intel.com>, <jolsa@...nel.org>,
<rostedt@...dmis.org>, <vincent.guittot@...aro.org>,
<adrian.hunter@...el.com>, <kan.liang@...ux.intel.com>,
<gautham.shenoy@....com>, <kprateek.nayak@....com>, <juri.lelli@...hat.com>,
<yangjihong@...edance.com>, <void@...ifault.com>, <tj@...nel.org>,
<sshegde@...ux.ibm.com>, <ctshao@...gle.com>, <quic_zhonhan@...cinc.com>,
<thomas.falcon@...el.com>, <blakejones@...gle.com>, <ashelat@...hat.com>,
<leo.yan@....com>, <dvyukov@...gle.com>, <ak@...ux.intel.com>,
<yujie.liu@...el.com>, <graham.woodward@....com>, <ben.gainey@....com>,
<vineethr@...ux.ibm.com>, <tim.c.chen@...ux.intel.com>, <linux@...blig.org>,
<santosh.shukla@....com>, <sandipan.das@....com>,
<linux-kernel@...r.kernel.org>, <linux-perf-users@...r.kernel.org>,
<peterz@...radead.org>, <mingo@...hat.com>, <namhyung@...nel.org>,
<irogers@...gle.com>, <james.clark@....com>, <acme@...nel.org>
Subject: Re: [PATCH v5 00/10] perf sched: Introduce stats tool
On 1/20/2026 1:58 AM, Swapnil Sapkal wrote:
> MOTIVATION
> ----------
>
> Existing `perf sched` is quite exhaustive and provides lot of insights
> into scheduler behavior but it quickly becomes impractical to use for
> long running or scheduler intensive workload. For ex, `perf sched record`
> has ~7.77% overhead on hackbench (with 25 groups each running 700K loops
> on a 2-socket 128 Cores 256 Threads 3rd Generation EPYC Server), and it
> generates huge 56G perf.data for which perf takes ~137 mins to prepare
> and write it to disk [1].
>
> Unlike `perf sched record`, which hooks onto set of scheduler tracepoints
> and generates samples on a tracepoint hit, `perf sched stats record` takes
> snapshot of the /proc/schedstat file before and after the workload, i.e.
> there is almost zero interference on workload run. Also, it takes very
> minimal time to parse /proc/schedstat, convert it into perf samples and
> save those samples into perf.data file. Result perf.data file is much
> smaller. So, overall `perf sched stats record` is much more light weight
> compare to `perf sched record`.
>
> We, internally at AMD, have been using this (a variant of this, known as
> "sched-scoreboard"[2]) and found it to be very useful to analyse impact
> of any scheduler code changes[3][4]. Prateek used v2[5] of this patch
> series to report the analysis[6][7].
>
> Please note that, this is not a replacement of perf sched record/report.
> The intended users of the new tool are scheduler developers, not regular
> users.
>
> USAGE
> -----
>
> # perf sched stats record
> # perf sched stats report
> # perf sched stats diff
>
> Note: Although `perf sched stats` tool supports workload profiling syntax
> (i.e. -- <workload> ), the recorded profile is still systemwide since the
> /proc/schedstat is a systemwide file.
>
I found this is useful for load balance analysis on my
384 CPUs system with 6.19.0-rc1, please feel free to add
Tested-by: Chen Yu <yu.c.chen@...el.com>
thanks,
Chenyu
Powered by blists - more mailing lists