[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250826051039.2626894-12-swapnil.sapkal@amd.com>
Date: Tue, 26 Aug 2025 05:10:39 +0000
From: Swapnil Sapkal <swapnil.sapkal@....com>
To: <peterz@...radead.org>, <mingo@...hat.com>, <juri.lelli@...hat.com>,
<vincent.guittot@...aro.org>, <corbet@....net>
CC: <dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
<mgorman@...e.de>, <vschneid@...hat.com>, <iamjoonsoo.kim@....com>,
<qyousef@...alina.io>, <sshegde@...ux.ibm.com>, <alexs@...nel.org>,
<lukasz.luba@....com>, <cpru@...zon.com>, <gautham.shenoy@....com>,
<kprateek.nayak@....com>, <ravi.bangoria@....com>, <swapnil.sapkal@....com>,
<linux-kernel@...r.kernel.org>, <linux-doc@...r.kernel.org>
Subject: [PATCH v4 11/11] perf sched stats: Add details in man page
Document perf sched stats purpose, usage examples and guide on
how to interpret the report data in the perf-sched man page.
Signed-off-by: Ravi Bangoria <ravi.bangoria@....com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@....com>
---
tools/perf/Documentation/perf-sched.txt | 261 +++++++++++++++++++++++-
1 file changed, 260 insertions(+), 1 deletion(-)
diff --git a/tools/perf/Documentation/perf-sched.txt b/tools/perf/Documentation/perf-sched.txt
index 6dbbddb6464d..5bfb7bb6c633 100644
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -8,7 +8,7 @@ perf-sched - Tool to trace/measure scheduler properties (latencies)
SYNOPSIS
--------
[verse]
-'perf sched' {record|latency|map|replay|script|timehist}
+'perf sched' {record|latency|map|replay|script|timehist|stats}
DESCRIPTION
-----------
@@ -80,8 +80,267 @@ There are several variants of 'perf sched':
Times are in msec.usec.
+ 'perf sched stats {record | report | diff} <command>' to capture, report the diff
+ in schedstat counters and show the difference between perf sched stats report
+ respectively. schedstat counters which are present in the linux kernel and are
+ exposed through the file ``/proc/schedstat``. These counters are enabled or disabled
+ via the sysctl governed by the file ``/proc/sys/kernel/sched_schedstats``. These
+ counters accounts for many scheduler events such as ``schedule()`` calls, load-balancing
+ events, ``try_to_wakeup()`` call among others. This is useful in understading the
+ scheduler behavior for the workload.
+
+ Note: The tool will not give correct results if there is topological reordering or
+ online/offline of cpus in between capturing snapshots of `/proc/schedstat`.
+
+ Example usage:
+ perf sched stats record -- sleep 1
+ perf sched stats report
+ perf sched stats diff
+
+ A detailed description of the schedstats can be found in the Kernel Documentation:
+ https://www.kernel.org/doc/html/latest/scheduler/sched-stats.html
+
+ The result can be interprested as follows:
+
+ The `perf sched stats report` starts with description of the columns present in
+ the report. These column names are given before cpu and domain stats to improve
+ the readability of the report.
+
+ ----------------------------------------------------------------------------------------------------
+ DESC -> Description of the field
+ COUNT -> Value of the field
+ PCT_CHANGE -> Percent change with corresponding base value
+ AVG_JIFFIES -> Avg time in jiffies between two consecutive occurrence of event
+ ----------------------------------------------------------------------------------------------------
+
+ Next is the total profiling time in terms of jiffies:
+
+ ----------------------------------------------------------------------------------------------------
+ Time elapsed (in jiffies) : 24537
+ ----------------------------------------------------------------------------------------------------
+
+ Next is CPU scheduling statistics. These are simple diffs of /proc/schedstat CPU lines
+ along with description. The report also prints % relative to base stat.
+
+ In the example below, schedule() left the CPU0 idle 36.58% of the time. 0.45% of total
+ try_to_wake_up() was to wakeup local CPU. And, the total waittime by tasks on CPU0 is
+ 48.70% of the total runtime by tasks on the same CPU.
+
+ ----------------------------------------------------------------------------------------------------
+ CPU 0
+ ----------------------------------------------------------------------------------------------------
+ DESC COUNT PCT_CHANGE
+ ----------------------------------------------------------------------------------------------------
+ yld_count : 0
+ array_exp : 0
+ sched_count : 402267
+ sched_goidle : 147161 ( 36.58% )
+ ttwu_count : 236309
+ ttwu_local : 1062 ( 0.45% )
+ rq_cpu_time : 7083791148
+ run_delay : 3449973971 ( 48.70% )
+ pcount : 255035
+ ----------------------------------------------------------------------------------------------------
+
+ Next is load balancing statistics. For each of the sched domains
+ (eg: `SMT`, `MC`, `DIE`...), the scheduler computes statistics under
+ the following three categories:
+
+ 1) Idle Load Balance: Load balancing performed on behalf of a long
+ idling CPU by some other CPU.
+ 2) Busy Load Balance: Load balancing performed when the CPU was busy.
+ 3) New Idle Balance : Load balancing performed when a CPU just became
+ idle.
+
+ Under each of these three categories, sched stats report provides
+ different load balancing statistics. Along with direct stats, the
+ report also contains derived metrics prefixed with *. Example:
+
+ ----------------------------------------------------------------------------------------------------
+ CPU 0, DOMAIN SMT CPUS 0,64
+ ----------------------------------------------------------------------------------------------------
+ DESC COUNT AVG_JIFFIES
+ ----------------------------------------- <Category busy> ------------------------------------------
+ busy_lb_count : 136 $ 17.08 $
+ busy_lb_balanced : 131 $ 17.73 $
+ busy_lb_failed : 0 $ 0.00 $
+ busy_lb_imbalance_load : 58
+ busy_lb_imbalance_util : 0
+ busy_lb_imbalance_task : 0
+ busy_lb_imbalance_misfit : 0
+ busy_lb_gained : 7
+ busy_lb_hot_gained : 0
+ busy_lb_nobusyq : 2 $ 1161.50 $
+ busy_lb_nobusyg : 129 $ 18.01 $
+ *busy_lb_success_count : 5
+ *busy_lb_avg_pulled : 1.40
+ ----------------------------------------- <Category idle> ------------------------------------------
+ idle_lb_count : 449 $ 5.17 $
+ idle_lb_balanced : 382 $ 6.08 $
+ idle_lb_failed : 3 $ 774.33 $
+ idle_lb_imbalance_load : 0
+ idle_lb_imbalance_util : 0
+ idle_lb_imbalance_task : 71
+ idle_lb_imbalance_misfit : 0
+ idle_lb_gained : 67
+ idle_lb_hot_gained : 0
+ idle_lb_nobusyq : 0 $ 0.00 $
+ idle_lb_nobusyg : 382 $ 6.08 $
+ *idle_lb_success_count : 64
+ *idle_lb_avg_pulled : 1.05
+ ---------------------------------------- <Category newidle> ----------------------------------------
+ newidle_lb_count : 30471 $ 0.08 $
+ newidle_lb_balanced : 28490 $ 0.08 $
+ newidle_lb_failed : 633 $ 3.67 $
+ newidle_lb_imbalance_load : 0
+ newidle_lb_imbalance_util : 0
+ newidle_lb_imbalance_task : 2040
+ newidle_lb_imbalance_misfit : 0
+ newidle_lb_gained : 1348
+ newidle_lb_hot_gained : 0
+ newidle_lb_nobusyq : 6 $ 387.17 $
+ newidle_lb_nobusyg : 26634 $ 0.09 $
+ *newidle_lb_success_count : 1348
+ *newidle_lb_avg_pulled : 1.00
+ ----------------------------------------------------------------------------------------------------
+
+ Consider following line:
+
+ newidle_lb_balanced : 28490 $ 0.08 $
+
+ While profiling was active, the load-balancer found 28490 times the load
+ needs to be balanced on a newly idle CPU 0. Following value encapsulated
+ inside $ is average jiffies between two events (28490 / 24537 = 0.08).
+
+ Next are active_load_balance() stats. alb did not trigger while the
+ profiling was active, hence it's all 0s.
+
+ --------------------------------- <Category active_load_balance()> ---------------------------------
+ alb_count : 0
+ alb_failed : 0
+ alb_pushed : 0
+ ----------------------------------------------------------------------------------------------------
+
+ Next are sched_balance_exec() and sched_balance_fork() stats. They are
+ not used but we kept it in RFC just for legacy purpose. Unless opposed,
+ we plan to remove them in next revision.
+
+ Next are wakeup statistics. For every domain, the report also shows
+ task-wakeup statistics. Example:
+
+ ------------------------------------------ <Wakeup Info> -------------------------------------------
+ ttwu_wake_remote : 1590
+ ttwu_move_affine : 84
+ ttwu_move_balance : 0
+ ----------------------------------------------------------------------------------------------------
+
+ Same set of stats are reported for each CPU and each domain level.
+
+ How to interpret the diff
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ The `perf sched stats diff` will also start with explaining the columns
+ present in the diff. Then it will show the diff in time in terms of
+ jiffies. The order of the values depends on the order of input data
+ files. It will take `perf.data.old` and `perf.data` respectively as the
+ defaults for comparison. Example:
+
+ ----------------------------------------------------------------------------------------------------
+ Time elapsed (in jiffies) : 2009, 2001
+ ----------------------------------------------------------------------------------------------------
+
+ Below is the sample representing the difference in cpu and domain stats of
+ two runs. Here third column or the values enclosed in `|...|` shows the
+ percent change between the two. Second and fourth columns shows the
+ side-by-side representions of the corresponding fields from `perf sched
+ stats report`.
+
+ ----------------------------------------------------------------------------------------------------
+ CPU <ALL CPUS SUMMARY>
+ ----------------------------------------------------------------------------------------------------
+ DESC COUNT1 COUNT2 PCT_CHANG>
+ ----------------------------------------------------------------------------------------------------
+ yld_count : 0, 0 | 0.00>
+ array_exp : 0, 0 | 0.00>
+ sched_count : 528533, 412573 | -21.94>
+ sched_goidle : 193426, 146082 | -24.48>
+ ttwu_count : 313134, 385975 | 23.26>
+ ttwu_local : 1126, 1282 | 13.85>
+ rq_cpu_time : 8257200244, 8301250047 | 0.53>
+ run_delay : 4728347053, 3997100703 | -15.47>
+ pcount : 335031, 266396 | -20.49>
+ ----------------------------------------------------------------------------------------------------
+
+ Below is the sample of domain stats diff:
+
+ ----------------------------------------------------------------------------------------------------
+ CPU <ALL CPUS SUMMARY>, DOMAIN SMT
+ ----------------------------------------------------------------------------------------------------
+ DESC COUNT1 COUNT2 PCT_CHANG>
+ ----------------------------------------- <Category busy> ------------------------------------------
+ busy_lb_count : 122, 80 | -34.43>
+ busy_lb_balanced : 115, 76 | -33.91>
+ busy_lb_failed : 1, 3 | 200.00>
+ busy_lb_imbalance_load : 35, 49 | 40.00>
+ busy_lb_imbalance_util : 0, 0 | 0.00>
+ busy_lb_imbalance_task : 0, 0 | 0.00>
+ busy_lb_imbalance_misfit : 0, 0 | 0.00>
+ busy_lb_gained : 7, 2 | -71.43>
+ busy_lb_hot_gained : 0, 0 | 0.00>
+ busy_lb_nobusyq : 0, 0 | 0.00>
+ busy_lb_nobusyg : 115, 76 | -33.91>
+ *busy_lb_success_count : 6, 1 | -83.33>
+ *busy_lb_avg_pulled : 1.17, 2.00 | 71.43>
+ ----------------------------------------- <Category idle> ------------------------------------------
+ idle_lb_count : 568, 620 | 9.15>
+ idle_lb_balanced : 462, 449 | -2.81>
+ idle_lb_failed : 11, 21 | 90.91>
+ idle_lb_imbalance_load : 0, 0 | 0.00>
+ idle_lb_imbalance_util : 0, 0 | 0.00>
+ idle_lb_imbalance_task : 115, 189 | 64.35>
+ idle_lb_imbalance_misfit : 0, 0 | 0.00>
+ idle_lb_gained : 103, 169 | 64.08>
+ idle_lb_hot_gained : 0, 0 | 0.00>
+ idle_lb_nobusyq : 0, 0 | 0.00>
+ idle_lb_nobusyg : 462, 449 | -2.81>
+ *idle_lb_success_count : 95, 150 | 57.89>
+ *idle_lb_avg_pulled : 1.08, 1.13 | 3.92>
+ ---------------------------------------- <Category newidle> ----------------------------------------
+ newidle_lb_count : 16961, 3155 | -81.40>
+ newidle_lb_balanced : 15646, 2556 | -83.66>
+ newidle_lb_failed : 397, 142 | -64.23>
+ newidle_lb_imbalance_load : 0, 0 | 0.00>
+ newidle_lb_imbalance_util : 0, 0 | 0.00>
+ newidle_lb_imbalance_task : 1376, 655 | -52.40>
+ newidle_lb_imbalance_misfit : 0, 0 | 0.00>
+ newidle_lb_gained : 917, 457 | -50.16>
+ newidle_lb_hot_gained : 0, 0 | 0.00>
+ newidle_lb_nobusyq : 3, 1 | -66.67>
+ newidle_lb_nobusyg : 14480, 2103 | -85.48>
+ *newidle_lb_success_count : 918, 457 | -50.22>
+ *newidle_lb_avg_pulled : 1.00, 1.00 | 0.11>
+ --------------------------------- <Category active_load_balance()> ---------------------------------
+ alb_count : 0, 1 | 0.00>
+ alb_failed : 0, 0 | 0.00>
+ alb_pushed : 0, 1 | 0.00>
+ --------------------------------- <Category sched_balance_exec()> ----------------------------------
+ sbe_count : 0, 0 | 0.00>
+ sbe_balanced : 0, 0 | 0.00>
+ sbe_pushed : 0, 0 | 0.00>
+ --------------------------------- <Category sched_balance_fork()> ----------------------------------
+ sbf_count : 0, 0 | 0.00>
+ sbf_balanced : 0, 0 | 0.00>
+ sbf_pushed : 0, 0 | 0.00>
+ ------------------------------------------ <Wakeup Info> -------------------------------------------
+ ttwu_wake_remote : 2031, 2914 | 43.48>
+ ttwu_move_affine : 73, 124 | 69.86>
+ ttwu_move_balance : 0, 0 | 0.00>
+ ----------------------------------------------------------------------------------------------------
+
OPTIONS
-------
+Applicable to {record|latency|map|replay|script}
+
-i::
--input=<file>::
Input file name. (default: perf.data unless stdin is a fifo)
--
2.43.0
Powered by blists - more mailing lists