[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2bb35bb3-cbe4-460f-a209-1fe4095e1dce@amd.com>
Date: Mon, 2 Jun 2025 10:14:22 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org
Cc: linux-kernel@...r.kernel.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com, clm@...a.com
Subject: Re: [RFC][PATCH 0/5] sched: Try and address some recent-ish
regressions
Hello Peter,
On 5/20/2025 3:15 PM, Peter Zijlstra wrote:
> As can be seen, the SPR is much easier to please than the SKL for whatever
> reason. I'm thinking we can make TTWU_QUEUE_DELAYED default on, but I suspect
> TTWU_QUEUE_DEFAULT might be a harder sell -- we'd need to run more than this
> one benchmark.
I haven't tried toggling any of the newly added SCHED_FEAT() yet.
Following are the numbers for the out of the box variant:
tl;dr Minor improvements across the board; no noticeable regressions
except for a few schbench datapoints but they also have a high
run-to-run variance so we should be good.
o Machine details
- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- C2 Disabled (POLL and C1(MWAIT) remained enabled)
o Kernel details
tip: tip:sched/core at commit 914873bc7df9 ("Merge tag
'x86-build-2025-05-25' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
ttwu_opt: tip + this series as is
o Benchmark results
==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) ttwu_opt[pct imp](CV)
1-groups 1.00 [ -0.00](13.74) 0.92 [ 7.68]( 6.04)
2-groups 1.00 [ -0.00]( 9.58) 1.04 [ -3.56]( 4.96)
4-groups 1.00 [ -0.00]( 2.10) 1.01 [ -1.30]( 2.27)
8-groups 1.00 [ -0.00]( 1.51) 0.99 [ 1.26]( 1.70)
16-groups 1.00 [ -0.00]( 1.10) 0.97 [ 3.01]( 1.62)
==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) ttwu_opt[pct imp](CV)
1 1.00 [ 0.00]( 0.82) 1.04 [ 4.33]( 1.84)
2 1.00 [ 0.00]( 1.13) 1.06 [ 5.52]( 1.04)
4 1.00 [ 0.00]( 1.12) 1.05 [ 5.41]( 0.53)
8 1.00 [ 0.00]( 0.93) 1.06 [ 5.72]( 0.47)
16 1.00 [ 0.00]( 0.38) 1.07 [ 6.99]( 0.50)
32 1.00 [ 0.00]( 0.66) 1.05 [ 4.68]( 1.79)
64 1.00 [ 0.00]( 1.18) 1.06 [ 5.53]( 0.37)
128 1.00 [ 0.00]( 1.12) 1.06 [ 5.52]( 0.13)
256 1.00 [ 0.00]( 0.42) 0.99 [ -0.83]( 1.01)
512 1.00 [ 0.00]( 0.14) 1.01 [ 1.06]( 0.13)
1024 1.00 [ 0.00]( 0.26) 1.02 [ 1.82]( 0.41)
==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) ttwu_opt[pct imp](CV)
Copy 1.00 [ 0.00]( 8.37) 0.97 [ -2.79]( 9.17)
Scale 1.00 [ 0.00]( 2.85) 1.00 [ 0.12]( 2.91)
Add 1.00 [ 0.00]( 3.39) 0.98 [ -2.36]( 4.85)
Triad 1.00 [ 0.00]( 6.39) 1.01 [ 1.45]( 8.42)
==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) ttwu_opt[pct imp](CV)
Copy 1.00 [ 0.00]( 3.91) 0.98 [ -1.84]( 2.07)
Scale 1.00 [ 0.00]( 4.34) 0.96 [ -3.80]( 6.38)
Add 1.00 [ 0.00]( 4.14) 0.97 [ -3.04]( 6.31)
Triad 1.00 [ 0.00]( 1.00) 0.98 [ -2.36]( 2.60)
==================================================================
Test : netperf
Units : Normalized Througput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) ttwu_opt[pct imp](CV)
1-clients 1.00 [ 0.00]( 0.41) 1.06 [ 5.63]( 1.17)
2-clients 1.00 [ 0.00]( 0.58) 1.06 [ 6.25]( 0.85)
4-clients 1.00 [ 0.00]( 0.35) 1.06 [ 5.59]( 0.49)
8-clients 1.00 [ 0.00]( 0.48) 1.06 [ 5.76]( 0.81)
16-clients 1.00 [ 0.00]( 0.66) 1.06 [ 5.95]( 0.69)
32-clients 1.00 [ 0.00]( 1.15) 1.06 [ 5.84]( 1.34)
64-clients 1.00 [ 0.00]( 1.38) 1.05 [ 5.20]( 1.50)
128-clients 1.00 [ 0.00]( 0.87) 1.04 [ 4.39]( 1.03)
256-clients 1.00 [ 0.00]( 5.36) 1.00 [ 0.10]( 3.48)
512-clients 1.00 [ 0.00](54.39) 0.98 [ -1.93](52.45)
==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) ttwu_opt[pct imp](CV)
1 1.00 [ -0.00]( 8.54) 0.89 [ 10.87](35.39)
2 1.00 [ -0.00]( 1.15) 0.88 [ 12.00]( 4.55)
4 1.00 [ -0.00](13.46) 0.96 [ 4.17](10.60)
8 1.00 [ -0.00]( 7.14) 0.84 [ 15.79]( 8.44)
16 1.00 [ -0.00]( 3.49) 1.08 [ -8.47]( 4.69)
32 1.00 [ -0.00]( 1.06) 1.10 [ -9.57]( 2.91)
64 1.00 [ -0.00]( 5.48) 1.25 [-25.00]( 5.36)
128 1.00 [ -0.00](10.45) 1.18 [-17.99](12.54)
256 1.00 [ -0.00](31.14) 1.28 [-27.79](17.66)
512 1.00 [ -0.00]( 1.52) 1.01 [ -0.51]( 2.78)
==================================================================
Test : new-schbench-requests-per-second
Units : Normalized Requests per second
Interpretation: Higher is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) ttwu_opt[pct imp](CV)
1 1.00 [ 0.00]( 1.07) 1.00 [ 0.29]( 0.00)
2 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.15)
4 1.00 [ 0.00]( 0.00) 1.00 [ -0.29]( 0.15)
8 1.00 [ 0.00]( 0.15) 1.00 [ 0.00]( 0.15)
16 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.00)
32 1.00 [ 0.00]( 3.41) 0.99 [ -0.95]( 2.06)
64 1.00 [ 0.00]( 1.05) 0.92 [ -7.58]( 9.01)
128 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.00)
256 1.00 [ 0.00]( 0.72) 1.00 [ -0.31]( 0.42)
512 1.00 [ 0.00]( 0.57) 1.00 [ 0.00]( 0.45)
==================================================================
Test : new-schbench-wakeup-latency
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) ttwu_opt[pct imp](CV)
1 1.00 [ -0.00]( 9.11) 0.75 [ 25.00](11.08)
2 1.00 [ -0.00]( 0.00) 1.00 [ -0.00]( 3.78)
4 1.00 [ -0.00]( 3.78) 0.93 [ 7.14]( 3.87)
8 1.00 [ -0.00]( 0.00) 1.08 [ -8.33](12.91)
16 1.00 [ -0.00]( 7.56) 0.92 [ 7.69](11.71)
32 1.00 [ -0.00](15.11) 1.07 [ -6.67]( 3.30)
64 1.00 [ -0.00]( 9.63) 1.00 [ -0.00]( 8.15)
128 1.00 [ -0.00]( 4.86) 0.89 [ 11.06]( 7.83)
256 1.00 [ -0.00]( 2.34) 1.00 [ 0.20]( 0.10)
512 1.00 [ -0.00]( 0.40) 1.00 [ 0.38]( 0.20)
==================================================================
Test : new-schbench-request-latency
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) ttwu_opt[pct imp](CV)
1 1.00 [ -0.00]( 2.73) 0.98 [ 2.08]( 1.04)
2 1.00 [ -0.00]( 0.87) 1.05 [ -5.40]( 3.10)
4 1.00 [ -0.00]( 1.21) 0.99 [ 0.54]( 1.27)
8 1.00 [ -0.00]( 0.27) 0.99 [ 0.79]( 2.14)
16 1.00 [ -0.00]( 4.04) 1.01 [ -0.53]( 0.55)
32 1.00 [ -0.00]( 7.35) 1.10 [ -9.97](21.10)
64 1.00 [ -0.00]( 3.54) 1.03 [ -2.89]( 1.55)
128 1.00 [ -0.00]( 0.37) 0.99 [ 0.62]( 0.00)
256 1.00 [ -0.00]( 9.57) 0.92 [ 8.36]( 2.22)
512 1.00 [ -0.00]( 1.82) 1.01 [ -1.23]( 0.94)
==================================================================
Test : Various longer running benchmarks
Units : %diff in throughput reported
Interpretation: Higher is better
Statistic : Median
==================================================================
Benchmarks: %diff
ycsb-cassandra -0.05%
ycsb-mongodb -0.80%
deathstarbench-1x 2.44%
deathstarbench-2x 5.47%
deathstarbench-3x 0.36%
deathstarbench-6x 1.14%
hammerdb+mysql 16VU 1.08%
hammerdb+mysql 64VU -0.43%
>
> Anyway, the patches are stable (finally!, I hope, knock on wood) but in a
> somewhat rough state. At the very least the last patch is missing ttwu_stat(),
> still need to figure out how to account it ;-)
>
Since TTWU_QUEUE_DELAYED is off by defaults, feel free to include:
Tested-by: K Prateek Nayak <kprateek.nayak@....com>
if you are planning on retaining the current defaults for the
SCHED_FEATs. I'll get back with numbers for TTWU_QUEUE_DELAYED and
TTWU_QUEUE_DEFAULT soon.
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists