[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250316102916.10614-1-kprateek.nayak@amd.com>
Date: Sun, 16 Mar 2025 10:29:10 +0000
From: K Prateek Nayak <kprateek.nayak@....com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Chen Yu <yu.c.chen@...el.com>,
<linux-kernel@...r.kernel.org>
CC: Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, David Vernet
<void@...ifault.com>, "Gautham R. Shenoy" <gautham.shenoy@....com>, "Swapnil
Sapkal" <swapnil.sapkal@....com>, Shrikanth Hegde <sshegde@...ux.ibm.com>, "K
Prateek Nayak" <kprateek.nayak@....com>
Subject: [RFC PATCH 09/08] [ANNOTATE] sched/fair: Stats versioning and invalidation
I would have loved to spin another version of this but me being slightly
short on time before OSPM decided to add these bits on top of the RFC.
Sorry for the inconvenience.
Stats versioning
================
Earlier experiments looked at aggressive stats caching and reuse. Load
balancing instances computed and cached the stats for non-local groups
hoping that they would be reused.
With stats versioning, the load balancing CPU only caches the stats for
the local hierarchy. Instead of the jiffy based "last_update" freshness,
this moves to versioning based on sched_clock_cpu() value.
Stats cached are invalidated if the CPU doing load balance is done
allowing fresher stats to be propagated. Stats computed by a concurrent
load balancing instance can now be reused allowing idle and newidle
balance to reuse stats effectively.
Stats versioning nuances are explained in Patch 11/08. Since idle and
newidle balance can reuse stats, changes have been made in aggregation
to consider reduced capacity, but also forego computing total capacity.
Benchmark results are as follows:
==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) versioning[pct imp](CV)
1-groups 1.00 [ -0.00](10.12) 1.00 [ 0.44](13.86)
2-groups 1.00 [ -0.00]( 6.92) 1.04 [ -4.32]( 3.00)
4-groups 1.00 [ -0.00]( 3.14) 1.00 [ -0.21]( 2.16)
8-groups 1.00 [ -0.00]( 1.35) 1.01 [ -1.25]( 1.32)
16-groups 1.00 [ -0.00]( 1.32) 1.01 [ -0.50]( 2.00)
==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) versioning[pct imp](CV)
1 1.00 [ 0.00]( 0.43) 0.98 [ -1.65]( 0.15)
2 1.00 [ 0.00]( 0.58) 1.01 [ 1.27]( 0.49)
4 1.00 [ 0.00]( 0.54) 1.00 [ 0.47]( 0.40)
8 1.00 [ 0.00]( 0.49) 1.00 [ -0.44]( 1.18)
16 1.00 [ 0.00]( 1.06) 1.00 [ -0.07]( 1.14)
32 1.00 [ 0.00]( 1.27) 1.00 [ 0.02]( 0.11)
64 1.00 [ 0.00]( 1.54) 0.99 [ -1.12]( 1.09)
128 1.00 [ 0.00]( 0.38) 0.98 [ -2.43]( 1.00)
256 1.00 [ 0.00]( 1.85) 0.99 [ -0.50]( 0.94)
512 1.00 [ 0.00]( 0.31) 0.99 [ -1.03]( 0.35)
1024 1.00 [ 0.00]( 0.19) 0.99 [ -0.56]( 0.42)
==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) versioning[pct imp](CV)
Copy 1.00 [ 0.00](11.31) 1.08 [ 7.51]( 4.74)
Scale 1.00 [ 0.00]( 6.62) 1.00 [ -0.31]( 7.45)
Add 1.00 [ 0.00]( 7.06) 1.02 [ 2.50]( 7.34)
Triad 1.00 [ 0.00]( 8.91) 1.08 [ 7.78]( 2.88)
==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) versioning[pct imp](CV)
Copy 1.00 [ 0.00]( 2.01) 1.02 [ 1.82]( 1.26)
Scale 1.00 [ 0.00]( 1.49) 1.00 [ 0.26]( 0.80)
Add 1.00 [ 0.00]( 2.67) 1.01 [ 0.98]( 1.29)
Triad 1.00 [ 0.00]( 2.19) 1.02 [ 2.06]( 1.01)
==================================================================
Test : netperf
Units : Normalized Througput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) versioning[pct imp](CV)
1-clients 1.00 [ 0.00]( 1.43) 0.99 [ -0.72]( 0.81)
2-clients 1.00 [ 0.00]( 1.02) 1.00 [ -0.09]( 1.11)
4-clients 1.00 [ 0.00]( 0.83) 1.00 [ 0.31]( 0.29)
8-clients 1.00 [ 0.00]( 0.73) 1.00 [ -0.25]( 0.61)
16-clients 1.00 [ 0.00]( 0.97) 1.00 [ -0.26]( 0.89)
32-clients 1.00 [ 0.00]( 0.88) 0.99 [ -0.61]( 0.82)
64-clients 1.00 [ 0.00]( 1.49) 0.99 [ -1.11]( 1.77)
128-clients 1.00 [ 0.00]( 1.05) 1.00 [ -0.03]( 1.13)
256-clients 1.00 [ 0.00]( 3.85) 1.00 [ -0.24]( 2.63)
512-clients 1.00 [ 0.00](59.63) 0.99 [ -0.74](59.01)
==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) versioning[pct imp](CV)
1 1.00 [ -0.00]( 6.67) 0.93 [ 6.67](15.25)
2 1.00 [ -0.00](10.18) 0.83 [ 17.39]( 7.15)
4 1.00 [ -0.00]( 4.49) 1.04 [ -4.26]( 6.12)
8 1.00 [ -0.00]( 6.68) 1.06 [ -5.66](12.98)
16 1.00 [ -0.00]( 1.87) 1.00 [ -0.00]( 3.38)
32 1.00 [ -0.00]( 4.01) 0.98 [ 2.20]( 4.79)
64 1.00 [ -0.00]( 3.21) 1.02 [ -1.68]( 0.84)
128 1.00 [ -0.00](44.13) 1.16 [-15.98](14.99)
256 1.00 [ -0.00](14.46) 0.90 [ 9.99](17.45)
512 1.00 [ -0.00]( 1.95) 0.98 [ 1.54]( 1.13)
==================================================================
Test : new-schbench-requests-per-second
Units : Normalized Requests per second
Interpretation: Higher is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) versioning[pct imp](CV)
1 1.00 [ 0.00]( 0.46) 1.00 [ 0.00]( 0.26)
2 1.00 [ 0.00]( 0.15) 1.00 [ -0.29]( 0.15)
4 1.00 [ 0.00]( 0.15) 1.00 [ -0.29]( 0.30)
8 1.00 [ 0.00]( 0.15) 1.00 [ -0.29]( 0.26)
16 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.00)
32 1.00 [ 0.00]( 3.40) 1.06 [ 5.93]( 1.22)
64 1.00 [ 0.00]( 7.09) 1.00 [ 0.00]( 0.20)
128 1.00 [ 0.00]( 0.00) 0.98 [ -1.52]( 0.34)
256 1.00 [ 0.00]( 1.12) 0.98 [ -2.41]( 1.19)
512 1.00 [ 0.00]( 0.22) 1.00 [ 0.00]( 0.43)
==================================================================
Test : new-schbench-wakeup-latency
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) versioning[pct imp](CV)
1 1.00 [ -0.00](19.72) 1.00 [ -0.00]( 8.37)
2 1.00 [ -0.00](15.96) 1.09 [ -9.09](11.08)
4 1.00 [ -0.00]( 3.87) 1.15 [-15.38](17.44)
8 1.00 [ -0.00]( 8.15) 0.92 [ 8.33]( 8.85)
16 1.00 [ -0.00]( 3.87) 1.23 [-23.08]( 5.59)
32 1.00 [ -0.00](12.99) 0.73 [ 26.67](16.75)
64 1.00 [ -0.00]( 6.20) 1.25 [-25.00]( 2.63)
128 1.00 [ -0.00]( 0.96) 1.62 [-62.37]( 1.30)
256 1.00 [ -0.00]( 2.76) 0.82 [ 17.89](10.56)
512 1.00 [ -0.00]( 0.20) 1.00 [ -0.00]( 0.34)
==================================================================
Test : new-schbench-request-latency
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) versioning[pct imp](CV)
1 1.00 [ -0.00]( 1.07) 1.02 [ -2.34]( 0.13)
2 1.00 [ -0.00]( 0.14) 1.04 [ -3.97]( 0.13)
4 1.00 [ -0.00]( 1.39) 1.03 [ -3.15]( 0.13)
8 1.00 [ -0.00]( 0.36) 1.03 [ -3.43]( 0.66)
16 1.00 [ -0.00]( 1.18) 0.99 [ 0.79]( 1.22)
32 1.00 [ -0.00]( 8.42) 0.82 [ 18.29]( 9.02)
64 1.00 [ -0.00]( 4.85) 1.00 [ -0.44]( 1.61)
128 1.00 [ -0.00]( 0.28) 1.06 [ -5.64]( 1.10)
256 1.00 [ -0.00](10.52) 0.81 [ 19.18](12.55)
512 1.00 [ -0.00]( 0.69) 1.00 [ 0.33]( 1.27)
==================================================================
Test : Various longer running benchmarks
Units : %diff in throughput reported
Interpretation: Higher is better
Statistic : Median
==================================================================
Benchmarks: %diff
ycsb-cassandra -0.76%
ycsb-mongodb 0.49%
deathstarbench-1x -2.37%
deathstarbench-2x 0.12%
deathstarbench-3x 2.30%
deathstarbench-6x 1.88%
hammerdb+mysql 16VU 3.85%
hammerdb+mysql 64VU 0.27%
Following are the schedstats diff for sched-messaging 4-group and
16-groups:
o 4-groups:
----------------------------------------------------------------------------------------------------
CPU <ALL CPUS SUMMARY>
----------------------------------------------------------------------------------------------------
DESC COUNT1 COUNT2 PCT_CHANGE PCT_CHANGE1 PCT_CHANGE2
----------------------------------------------------------------------------------------------------
sched_yield() count : 0, 0 | 0.00% |
Legacy counter can be ignored : 0, 0 | 0.00% |
schedule() called : 174683, 176871 | 1.25% |
schedule() left the processor idle : 86742, 88113 | 1.58% | ( 49.66%, 49.82% )
try_to_wake_up() was called : 87675, 88622 | 1.08% |
try_to_wake_up() was called to wake up the local cpu : 28, 26 | -7.14% | ( 0.03%, 0.03% )
total runtime by tasks on this processor (in jiffies) : 2124248214, 2118780927 | -0.26% |
total waittime by tasks on this processor (in jiffies) : 24160304, 16912073 | -30.00% | ( 1.14%, 0.80% )
total timeslices run on this cpu : 87936, 88753 | 0.93% |
----------------------------------------------------------------------------------------------------
---------------------------------------- <Category newidle> ----------------------------------------
SMT:
load_balance() total time to balance on newly idle : 449650, 465044 | 3.42% |
load_balance() stats reused on newly idle : 0, 0 | 0.00% |
load_balance() stats recomputed on newly idle : 2493, 2679 | 7.46% |
MC:
load_balance() total time to balance on newly idle : 660742, 610346 | -7.63% |
load_balance() stats reused on newly idle : 0, 1898 | 0.00% |
load_balance() stats recomputed on newly idle : 3985, 3527 | -11.49% |
PKG:
load_balance() total time to balance on newly idle : 725938, 530707 | -26.89% |
load_balance() stats reused on newly idle : 0, 401 | 0.00% |
load_balance() stats recomputed on newly idle : 722, 474 | -34.35% |
NUMA:
load_balance() total time to balance on newly idle : 406862, 410386 | 0.87% |
load_balance() stats reused on newly idle : 0, 36 | 0.00% |
load_balance() stats recomputed on newly idle : 48, 39 | -18.75% |
o 16-groups:
----------------------------------------------------------------------------------------------------
CPU <ALL CPUS SUMMARY>
----------------------------------------------------------------------------------------------------
DESC COUNT1 COUNT2 PCT_CHANGE PCT_CHANGE1 PCT_CHANGE2
----------------------------------------------------------------------------------------------------
sched_yield() count : 0, 0 | 0.00% |
Legacy counter can be ignored : 0, 0 | 0.00% |
schedule() called : 566558, 554784 | -2.08% |
schedule() left the processor idle : 222161, 212164 | -4.50% | ( 39.21%, 38.24% )
try_to_wake_up() was called : 325303, 322690 | -0.80% |
try_to_wake_up() was called to wake up the local cpu : 990, 1017 | 2.73% | ( 0.30%, 0.32% )
total runtime by tasks on this processor (in jiffies) : 8807593610, 9142526964 | 3.80% |
total waittime by tasks on this processor (in jiffies) : 4093286876, 4314147489 | 5.40% | ( 46.47%, 47.19% )
total timeslices run on this cpu : 344281, 342495 | -0.52% |
----------------------------------------------------------------------------------------------------
---------------------------------------- <Category newidle> ----------------------------------------
SMT:
load_balance() total time to balance on newly idle : 9841719, 11615891 | 18.03% |
load_balance() stats reused on newly idle : 0, 0 | 0.00% |
load_balance() stats recomputed on newly idle : 28103, 27084 | -3.63% |
MC:
load_balance() total time to balance on newly idle : 20079305, 18103792 | -9.84% |
load_balance() stats reused on newly idle : 0, 37820 | 0.00% |
load_balance() stats recomputed on newly idle : 63885, 33518 | -47.53% |
PKG:
load_balance() total time to balance on newly idle : 17972213, 16430220 | -8.58% |
load_balance() stats reused on newly idle : 0, 8461 | 0.00% |
load_balance() stats recomputed on newly idle : 11513, 6318 | -45.12% |
NUMA:
load_balance() total time to balance on newly idle : 11050651, 9890509 | -10.50% |
load_balance() stats reused on newly idle : 0, 496 | 0.00% |
load_balance() stats recomputed on newly idle : 827, 524 | -36.64% |
---
Note: perf sched stats cannot properly aggregate "min" and "max" fields
yet.
Signed-off-by: K Prateek Nayak <kprateek.nayak@....com>
--
2.43.0
Powered by blists - more mailing lists