lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <37ed5ccf-7419-4311-ab53-b47fc93febae@meta.com>
Date: Tue, 24 Jun 2025 17:59:21 -0400
From: Chris Mason <clm@...a.com>
To: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
        Ingo Molnar <mingo@...nel.org>
Subject: [ANNOUNCE] rsched -- process scheduler statistics

Hi everyone,

I've been wanting to write up a tool to give a broad overview of process
scheduler metrics, and I used our recent regression hunt as an excuse
to work something up.

https://github.com/masoncl/rsched

rsched is meant to sit next to the running workload, kind of like vmstat or
top.  The goal is just being able to quickly decide if I should blame the
scheduler, and maybe give some ideas about which part to blame.  It doesn't
provide the full details like systing or wprof, both of which are much better
for in-depth analysis.

rsched is written in rust and uses BPF to watch a few scheduler tracepoints,
as well as some performance counters.  I'm obviously still learning rust, so
don't be shy with the pull requests if you see chances to improve the code.

Any suggestions on how to make the output more correct, compact and/or usable
are always welcome.  To give you an idea of the details, I'll give the
full output from a fast and slow schbench run.  The README.md in github has
command line options etc.

Command line:

rsched -m -i 10 --schedstat -c schbench -C

This filters for just schbench procs, and turns on the CPU perf counters

v6.15-rc2 (the slow run)

Starting rsched - Runqueue scheduling delay tracker
Options active:
  - Command pattern: schbench
  - Not collapsing by command name
  - Runtime limit: 12 seconds
Press Ctrl-C to stop...

Sched_waking tracepoints disabled for better performance
CPU performance counters enabled
=== Per-Process Scheduling Delays (microseconds) ===

Very Low (<10μs) (1028 entries):
  PID      COMMAND         p50        p90        p99        COUNT       
  35886    schbench-worker 5          9          9          35133       
  35372    schbench-worker 5          9          9          35816       
  35183    schbench-worker 5          9          9          35491       
  35835    schbench-worker 5          9          9          35486       
  35943    schbench-worker 5          9          9          41423       
  35784    schbench-worker 5          9          9          36232       
  35572    schbench-worker 5          9          9          35537       
  35552    schbench-worker 5          9          9          41604       
  35308    schbench-worker 5          9          9          36427       
  35401    schbench-worker 5          9          9          32346       
  ... and 1018 more

Low (10-100μs) (1 entries):
  PID      COMMAND      p50        p90        p99        COUNT       
  35025    schbench-msg 9          47         110        171         


=== Per-Process Time Slice Statistics (microseconds) ===

Time slice statistics grouped by preemption rate:

  PID      COMMAND         INVOL_COUNT  VOLUNTARY(p50/p90)   PREEMPTED(p50/p90)   PREEMPT%    
  35027    schbench-msg    101          -                    95889/123483         100.0       %
  35028    schbench-msg    100          -                    96579/124173         100.0       %
  35026    schbench-msg    99           -                    96579/124173         100.0       %
  35025    schbench-msg    170          -                    50365/112503         100.0       %
  35942    schbench-worker 4            5/9                  15/17                0.0         %
  35937    schbench-worker 3            5/9                  10/15                0.0         %
  35661    schbench-worker 3            5/9                  13/16                0.0         %
  35251    schbench-worker 3            5/9                  10/20                0.0         %
  35345    schbench-worker 3            5/9                  13/16                0.0         %
  35656    schbench-worker 3            5/9                  13/16                0.0         %
  ... and 1019 more


=== Per-Process Sleep Duration Statistics (microseconds) ===
(Time spent sleeping between sched_switch and sched_wakeup)

Medium (100μs-1ms) (1024 entries):
  PID      COMMAND         p50        p90        p99        COUNT       
  35173    schbench-worker 599        939        1015       26744       
  35910    schbench-worker 566        932        1014       29204       
  35148    schbench-worker 562        931        1014       28601       
  35782    schbench-worker 556        930        1014       29246       
  35985    schbench-worker 549        929        1014       28182       
  35877    schbench-worker 553        929        1014       29455       
  35524    schbench-worker 548        928        1014       29650       
  36010    schbench-worker 523        923        1013       30265       
  35343    schbench-worker 508        920        1013       31055       
  35118    schbench-worker 509        920        1013       30997       
  ... and 1014 more

Very High (>10ms) (1 entries):
  PID      COMMAND  p50        p90        p99        COUNT       
  35024    schbench 757304     990321     990321     9           


=== Per-Process Runqueue Depth Statistics (nr_running at wakeup) ===

Processes by runqueue depth at wakeup:

  PID      COMMAND         p50        p90        p99        COUNT       
  35886    schbench-worker 1          1          1          70704       
  35372    schbench-worker 1          1          1          72022       
  35183    schbench-worker 1          1          1          71410       
  35835    schbench-worker 1          1          1          71392       
  35943    schbench-worker 1          1          1          83351       
  35784    schbench-worker 1          1          1          72810       
  35572    schbench-worker 1          1          1          71386       
  35552    schbench-worker 1          1          1          83733       
  35308    schbench-worker 1          1          1          73241       
  35401    schbench-worker 1          1          1          65033       
  ... and 1019 more


=== System-wide Schedstat Metrics (deltas) ===
alb_count                              0 | lb_hot_gained_idle                     0 | lb_nobusyg_idle                   636568 
alb_failed                             0 | lb_hot_gained_newly_idle               0 | lb_nobusyg_newly_idle            3922226 
alb_pushed                             0 | lb_hot_gained_not_idle                 0 | lb_nobusyg_not_idle                  650 
lb_balance_idle                   676344 | lb_imbalance_load_idle                 0 | lb_nobusyq_idle                      171 
lb_balance_newly_idle            3944496 | lb_imbalance_load_newly_idle           0 | lb_nobusyq_newly_idle              14806 
lb_balance_not_idle                  698 | lb_imbalance_load_not_idle             0 | lb_nobusyq_not_idle                    0 
lb_count_idle                     678855 | lb_imbalance_misfit_idle               0 | sbe_balanced                           0 
lb_count_newly_idle              4845653 | lb_imbalance_misfit_newly_idle         0 | sbe_cnt                                0 
lb_count_not_idle                    698 | lb_imbalance_misfit_not_idle           0 | sbe_pushed                             0 
lb_failed_idle                      2399 | lb_imbalance_task_idle              2890 | sbf_balanced                           0 
lb_failed_newly_idle              896235 | lb_imbalance_task_newly_idle     2113470 | sbf_cnt                                0 
lb_failed_not_idle                     0 | lb_imbalance_task_not_idle             0 | sbf_pushed                             0 
lb_gained_idle                       112 | lb_imbalance_util_idle                 0 | ttwu_move_affine                   10159 
lb_gained_newly_idle                4926 | lb_imbalance_util_newly_idle           0 | ttwu_move_balance                      0 
lb_gained_not_idle                     0 | lb_imbalance_util_not_idle             0 | ttwu_wake_remote                37738806 

=== CPU Field Totals (deltas) ===
yld_count                 5685 | sched_count                  0 | sched_goidle          75529957 
ttwu_count            37721164 | ttwu_local            37785024 | rq_cpu_time              46219 
rq_run_delay usec          287 | rq_pcount            673321838 | 


=== Per-CPU Scheduling Delays (microseconds) ===

Very Low (<10μs): CPUs 1-251
  Aggregate: p50=5      p90=9      p99=9      count=37575995

Low (10-100μs): CPUs 0
  Aggregate: p50=6      p90=30     p99=95     count=350


=== Per-CPU Idle Duration (microseconds) ===

Very Short (<100μs): CPUs 4-15,130-141
  Aggregate: p50=16     p90=50     p99=102    count=9636083

Short (100μs-1ms): CPUs 16-129,142-251
  Aggregate: p50=50     p90=175    p99=396    count=28157410


=== CPU Performance Metrics ===
Global: User 5301.3M cycles/sec (IPC: 0.23), Kernel 23514.0M cycles/sec (IPC: 1.20)

CPU Performance by Usage Group (cycles are per timeslice):

Medium (100M-1G cycles p99) (4 entries):
  PID      COMMAND      USER CYC(p50/p99)    KERN CYC(p50/p99)    U-IPC    K-IPC   
  35026    schbench-msg 24.8M/33.4M          191.3M/266.9M        0.10     0.51    
  35027    schbench-msg 24.7M/33.4M          188.8M/265.3M        0.08     0.44    
  35028    schbench-msg 24.7M/33.4M          188.8M/265.3M        0.08     0.43    
  35025    schbench-msg 17.5M/33.2M          91.5M/263.2M         0.09     0.45    

Very Low (<10M cycles p99) (1025 entries):
  PID      COMMAND         USER CYC(p50/p99)    KERN CYC(p50/p99)    U-IPC    K-IPC   
  35563    schbench-worker 1.1K/2.0K            6.4K/82.5K           0.43     1.23    
  35024    schbench        49.2K/65.5K          12.3K/16.4K          2.21     3.85    
  36040    schbench-worker 965/2.0K             6.2K/76.5K           0.45     1.31    
  35069    schbench-worker 1.4K/3.2K            6.3K/65.5K           0.33     1.32    
  35515    schbench-worker 1.3K/2.8K            6.0K/62.3K           0.39     1.42    
  35783    schbench-worker 1.6K/4.0K            6.4K/60.2K           0.23     1.33    
  36035    schbench-worker 1.8K/4.0K            6.4K/60.0K           0.22     1.40    
  35483    schbench-worker 1.8K/4.0K            6.4K/59.9K           0.22     1.41    
  35435    schbench-worker 2.3K/4.1K            6.4K/59.7K           0.22     1.38    
  35353    schbench-worker 1.6K/3.4K            6.4K/60.3K           0.27     1.39    
  ... and 1015 more

-----------------------------------------------------------------------------------

v6.16-rc2 + a fix (the fast run)

The big change you can see here is the p99 of kernel cycles is
dramatically lower.  Also, the schedstats for new idle balance
are much lower.

Starting rsched - Runqueue scheduling delay tracker
Options active:
  - Command pattern: schbench
  - Not collapsing by command name
  - Runtime limit: 12 seconds
Press Ctrl-C to stop...

Sched_waking tracepoints disabled for better performance
CPU performance counters enabled
=== Per-Process Scheduling Delays (microseconds) ===

Very Low (<10μs) (1028 entries):
  PID      COMMAND         p50        p90        p99        COUNT       
  3968481  schbench-worker 5          9          9          45999       
  3968652  schbench-worker 5          9          9          45966       
  3969197  schbench-worker 5          9          9          43535       
  3968362  schbench-worker 5          9          9          39690       
  3968832  schbench-worker 5          9          9          47972       
  3968660  schbench-worker 5          9          9          48020       
  3968393  schbench-worker 5          9          9          48051       
  3968468  schbench-worker 5          9          9          48038       
  3968967  schbench-worker 5          9          9          40810       
  3968635  schbench-worker 5          9          9          45979       
  ... and 1018 more

Low (10-100μs) (1 entries):
  PID      COMMAND      p50        p90        p99        COUNT       
  3968289  schbench-msg 5          18         100        120         


=== Per-Process Time Slice Statistics (microseconds) ===

Time slice statistics grouped by preemption rate:

  PID      COMMAND         INVOL_COUNT  VOLUNTARY(p50/p90)   PREEMPTED(p50/p90)   PREEMPT%    
  3968292  schbench-msg    100          -                    96938/124245         100.0       %
  3968290  schbench-msg    100          -                    96212/124100         100.0       %
  3968291  schbench-msg    99           -                    96938/124245         100.0       %
  3968289  schbench-msg    119          -                    86388/122135         100.0       %
  3968426  schbench-worker 5            5/9                  4/8                  0.0         %
  3968902  schbench-worker 4            5/9                  6/10                 0.0         %
  3969157  schbench-worker 3            5/9                  20/25                0.0         %
  3969252  schbench-worker 3            5/9                  10/15                0.0         %
  3968776  schbench-worker 3            5/9                  10/15                0.0         %
  3968710  schbench-worker 3            5/9                  10/15                0.0         %
  ... and 1019 more


=== Per-Process Sleep Duration Statistics (microseconds) ===
(Time spent sleeping between sched_switch and sched_wakeup)

Medium (100μs-1ms) (1024 entries):
  PID      COMMAND         p50        p90        p99        COUNT       
  3968983  schbench-worker 438        487        714        31072       
  3968845  schbench-worker 421        486        708        32322       
  3968858  schbench-worker 421        482        633        34969       
  3969133  schbench-worker 414        480        600        36085       
  3968351  schbench-worker 314        479        577        36622       
  3968372  schbench-worker 414        475        511        37911       
  3968853  schbench-worker 398        475        511        38285       
  3968778  schbench-worker 329        471        510        39345       
  3969079  schbench-worker 313        470        507        38772       
  3969157  schbench-worker 387        469        508        39111       
  ... and 1014 more

Very High (>10ms) (1 entries):
  PID      COMMAND  p50        p90        p99        COUNT       
  3968288  schbench 757304     990321     990321     9           


=== Per-Process Runqueue Depth Statistics (nr_running at wakeup) ===

Processes by runqueue depth at wakeup:

  PID      COMMAND         p50        p90        p99        COUNT       
  3968288  schbench        1          2          3          20          
  3968481  schbench-worker 1          1          2          92460       
  3968652  schbench-worker 1          1          1          92512       
  3969197  schbench-worker 1          1          2          87580       
  3968362  schbench-worker 1          1          1          79681       
  3968832  schbench-worker 1          1          1          96353       
  3968660  schbench-worker 1          1          2          96443       
  3968393  schbench-worker 1          1          1          96642       
  3968468  schbench-worker 1          1          2          96595       
  3968967  schbench-worker 1          1          1          81998       
  ... and 1019 more


=== System-wide Schedstat Metrics (deltas) ===
alb_count                              2 | lb_hot_gained_idle                     3 | lb_nobusyg_idle                   594665 
alb_failed                             0 | lb_hot_gained_newly_idle               5 | lb_nobusyg_newly_idle              27247 
alb_pushed                             2 | lb_hot_gained_not_idle                 0 | lb_nobusyg_not_idle                  573 
lb_balance_idle                   618752 | lb_imbalance_load_idle                 0 | lb_nobusyq_idle                      338 
lb_balance_newly_idle              27389 | lb_imbalance_load_newly_idle           0 | lb_nobusyq_newly_idle                 62 
lb_balance_not_idle                  611 | lb_imbalance_load_not_idle             0 | lb_nobusyq_not_idle                    0 
lb_count_idle                     620785 | lb_imbalance_misfit_idle               0 | sbe_balanced                           0 
lb_count_newly_idle                41461 | lb_imbalance_misfit_newly_idle         0 | sbe_cnt                                0 
lb_count_not_idle                    611 | lb_imbalance_misfit_not_idle           0 | sbe_pushed                             0 
lb_failed_idle                      1611 | lb_imbalance_task_idle              2710 | sbf_balanced                           0 
lb_failed_newly_idle               12835 | lb_imbalance_task_newly_idle       79009 | sbf_cnt                                0 
lb_failed_not_idle                     0 | lb_imbalance_task_not_idle             0 | sbf_pushed                             0 
lb_gained_idle                       429 | lb_imbalance_util_idle                 0 | ttwu_move_affine                   19124 
lb_gained_newly_idle                1270 | lb_imbalance_util_newly_idle           0 | ttwu_move_balance                      0 
lb_gained_not_idle                     0 | lb_imbalance_util_not_idle             0 | ttwu_wake_remote                46817555 

=== CPU Field Totals (deltas) ===
yld_count                 3565 | sched_count                  0 | sched_goidle          93280935 
ttwu_count            46387216 | ttwu_local            46865150 | rq_cpu_time              47597 
rq_run_delay usec           80 | rq_pcount           2681345616 | 


=== Per-CPU Scheduling Delays (microseconds) ===

Very Low (<10μs): CPUs 0-251
  Aggregate: p50=5      p90=9      p99=9      count=46639878


=== Per-CPU Idle Duration (microseconds) ===

Very Short (<100μs): CPUs 4-15,126-141
  Aggregate: p50=6      p90=16     p99=39     count=26621635

Short (100μs-1ms): CPUs 16-125,142-251
  Aggregate: p50=73     p90=263    p99=431    count=19862521


=== CPU Performance Metrics ===
Global: User 5636.6M cycles/sec (IPC: 0.27), Kernel 23515.6M cycles/sec (IPC: 1.40)

CPU Performance by Usage Group (cycles are per timeslice):

Medium (100M-1G cycles p99) (4 entries):
  PID      COMMAND      USER CYC(p50/p99)    KERN CYC(p50/p99)    U-IPC    K-IPC   
  3968291  schbench-msg 24.9M/33.4M          190.4M/266.9M        0.13     0.73    
  3968290  schbench-msg 24.8M/33.4M          189.5M/266.9M        0.12     0.70    
  3968292  schbench-msg 24.7M/33.2M          188.8M/265.3M        0.13     0.74    
  3968289  schbench-msg 22.7M/33.2M          161.1M/264.9M        0.12     0.66    

Very Low (<10M cycles p99) (1025 entries):
  PID      COMMAND         USER CYC(p50/p99)    KERN CYC(p50/p99)    U-IPC    K-IPC   
  3968288  schbench        39.3K/65.5K          13.7K/32.8K          4.13     4.15    
  3968914  schbench-worker 1.6K/3.9K            6.2K/14.9K           0.23     1.47    
  3968997  schbench-worker 1.7K/4.0K            6.2K/14.6K           0.22     1.45    
  3969198  schbench-worker 1.7K/4.0K            6.2K/14.6K           0.22     1.44    
  3968393  schbench-worker 2.0K/4.1K            6.2K/14.5K           0.24     1.49    
  3968372  schbench-worker 1.6K/3.4K            6.3K/15.2K           0.27     1.46    
  3968805  schbench-worker 1.5K/3.9K            6.2K/14.5K           0.25     1.50    
  3969261  schbench-worker 1.6K/4.0K            6.1K/14.3K           0.30     1.55    
  3968881  schbench-worker 1.8K/4.0K            6.2K/14.2K           0.26     1.51    
  3968511  schbench-worker 1.5K/4.0K            6.1K/14.1K           0.31     1.54    
  ... and 1015 more


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ