[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5216c899-efec-4524-a5a1-1fdcd2834165@amd.com>
Date: Wed, 18 Jun 2025 12:07:34 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Jianyong Wu <jianyong.wu@...look.com>, Jianyong Wu <wujianyong@...on.cn>,
mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: allow imbalance between LLCs under NUMA
Hello Jianyong,
On 6/16/2025 7:52 AM, Jianyong Wu wrote:
> Would you mind letting me know if you've had a chance to try it out, or if there's any update on the progress?
Here are my results from a dual socket 3rd Generation EPYC
system.
tl;dr I don't see any improvement and a few regressions too
but few of those data points also have a lot of variance.
o Machine details
- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- C2 Disabled (POLL and C1(MWAIT) remained enabled)
o Kernel details
tip: tip:sched/core at commit 914873bc7df9 ("Merge tag
'x86-build-2025-05-25' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
allow_imb: tip + this series as is
o Benchmark results
==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) allow_imb[pct imp](CV)
1-groups 1.00 [ -0.00](13.74) 1.03 [ -3.20]( 9.18)
2-groups 1.00 [ -0.00]( 9.58) 1.06 [ -6.46]( 7.63)
4-groups 1.00 [ -0.00]( 2.10) 1.01 [ -1.30]( 1.90)
8-groups 1.00 [ -0.00]( 1.51) 0.99 [ 1.42]( 0.91)
16-groups 1.00 [ -0.00]( 1.10) 0.99 [ 1.09]( 1.13)
==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) allow_imb[pct imp](CV)
1 1.00 [ 0.00]( 0.82) 1.01 [ 1.11]( 0.27)
2 1.00 [ 0.00]( 1.13) 1.00 [ -0.05]( 0.62)
4 1.00 [ 0.00]( 1.12) 1.02 [ 2.36]( 0.19)
8 1.00 [ 0.00]( 0.93) 1.01 [ 1.02]( 0.86)
16 1.00 [ 0.00]( 0.38) 1.01 [ 0.71]( 1.71)
32 1.00 [ 0.00]( 0.66) 1.01 [ 1.31]( 1.88)
64 1.00 [ 0.00]( 1.18) 0.98 [ -1.60]( 2.90)
128 1.00 [ 0.00]( 1.12) 1.02 [ 1.60]( 0.42)
256 1.00 [ 0.00]( 0.42) 1.00 [ 0.40]( 0.80)
512 1.00 [ 0.00]( 0.14) 1.01 [ 0.97]( 0.25)
1024 1.00 [ 0.00]( 0.26) 1.01 [ 1.29]( 0.19)
==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) allow_imb[pct imp](CV)
Copy 1.00 [ 0.00]( 8.37) 1.01 [ 1.00]( 5.71)
Scale 1.00 [ 0.00]( 2.85) 0.98 [ -1.94]( 5.23)
Add 1.00 [ 0.00]( 3.39) 0.99 [ -1.39]( 4.77)
Triad 1.00 [ 0.00]( 6.39) 1.05 [ 5.15]( 5.62)
==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) allow_imb[pct imp](CV)
Copy 1.00 [ 0.00]( 3.91) 1.01 [ 1.28]( 2.01)
Scale 1.00 [ 0.00]( 4.34) 0.99 [ -0.65]( 3.74)
Add 1.00 [ 0.00]( 4.14) 1.01 [ 0.54]( 1.63)
Triad 1.00 [ 0.00]( 1.00) 0.98 [ -2.28]( 4.89)
==================================================================
Test : netperf
Units : Normalized Througput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) allow_imb[pct imp](CV)
1-clients 1.00 [ 0.00]( 0.41) 1.01 [ 1.17]( 0.39)
2-clients 1.00 [ 0.00]( 0.58) 1.01 [ 1.00]( 0.40)
4-clients 1.00 [ 0.00]( 0.35) 1.01 [ 0.73]( 0.50)
8-clients 1.00 [ 0.00]( 0.48) 1.00 [ 0.42]( 0.67)
16-clients 1.00 [ 0.00]( 0.66) 1.01 [ 0.84]( 0.57)
32-clients 1.00 [ 0.00]( 1.15) 1.01 [ 0.82]( 0.96)
64-clients 1.00 [ 0.00]( 1.38) 1.00 [ -0.24]( 3.09)
128-clients 1.00 [ 0.00]( 0.87) 1.00 [ -0.16]( 1.02)
256-clients 1.00 [ 0.00]( 5.36) 1.01 [ 0.66]( 4.55)
512-clients 1.00 [ 0.00](54.39) 0.98 [ -1.59](57.35)
==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) allow_imb[pct imp](CV)
1 1.00 [ -0.00]( 8.54) 1.04 [ -4.35]( 3.69)
2 1.00 [ -0.00]( 1.15) 0.96 [ 4.00]( 0.00)
4 1.00 [ -0.00](13.46) 1.02 [ -2.08]( 2.04)
8 1.00 [ -0.00]( 7.14) 0.82 [ 17.54]( 9.30)
16 1.00 [ -0.00]( 3.49) 1.05 [ -5.08]( 7.83)
32 1.00 [ -0.00]( 1.06) 1.01 [ -1.06]( 5.88)
64 1.00 [ -0.00]( 5.48) 1.05 [ -4.65]( 2.71)
128 1.00 [ -0.00](10.45) 1.09 [ -9.11](14.18)
256 1.00 [ -0.00](31.14) 1.05 [ -5.15]( 9.79)
512 1.00 [ -0.00]( 1.52) 0.96 [ 4.30]( 0.26)
==================================================================
Test : new-schbench-requests-per-second
Units : Normalized Requests per second
Interpretation: Higher is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) allow_imb[pct imp](CV)
1 1.00 [ 0.00]( 1.07) 1.00 [ 0.29]( 0.61)
2 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.26)
4 1.00 [ 0.00]( 0.00) 1.00 [ -0.29]( 0.00)
8 1.00 [ 0.00]( 0.15) 1.00 [ 0.29]( 0.15)
16 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.00)
32 1.00 [ 0.00]( 3.41) 0.97 [ -2.86]( 2.91)
64 1.00 [ 0.00]( 1.05) 0.97 [ -3.17]( 7.39)
128 1.00 [ 0.00]( 0.00) 1.00 [ -0.38]( 0.39)
256 1.00 [ 0.00]( 0.72) 1.01 [ 0.61]( 0.96)
512 1.00 [ 0.00]( 0.57) 1.01 [ 0.72]( 0.21)
==================================================================
Test : new-schbench-wakeup-latency
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) allow_imb[pct imp](CV)
1 1.00 [ -0.00]( 9.11) 0.69 [ 31.25]( 8.13)
2 1.00 [ -0.00]( 0.00) 0.93 [ 7.14]( 8.37)
4 1.00 [ -0.00]( 3.78) 1.07 [ -7.14](14.79)
8 1.00 [ -0.00]( 0.00) 1.08 [ -8.33]( 7.56)
16 1.00 [ -0.00]( 7.56) 1.08 [ -7.69](34.36)
32 1.00 [ -0.00](15.11) 1.00 [ -0.00](12.99)
64 1.00 [ -0.00]( 9.63) 0.80 [ 20.00](11.17)
128 1.00 [ -0.00]( 4.86) 0.98 [ 2.01](13.01)
256 1.00 [ -0.00]( 2.34) 1.01 [ -1.00]( 3.51)
512 1.00 [ -0.00]( 0.40) 1.00 [ 0.38]( 0.20)
==================================================================
Test : new-schbench-request-latency
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) allow_imb[pct imp](CV)
1 1.00 [ -0.00]( 2.73) 0.98 [ 2.08]( 3.51)
2 1.00 [ -0.00]( 0.87) 0.99 [ 0.54]( 3.29)
4 1.00 [ -0.00]( 1.21) 1.06 [ -5.92]( 0.82)
8 1.00 [ -0.00]( 0.27) 1.03 [ -3.15]( 1.86)
16 1.00 [ -0.00]( 4.04) 1.00 [ -0.27]( 2.27)
32 1.00 [ -0.00]( 7.35) 1.30 [-30.45](20.57)
64 1.00 [ -0.00]( 3.54) 1.01 [ -0.67]( 0.82)
128 1.00 [ -0.00]( 0.37) 1.00 [ 0.21]( 0.18)
256 1.00 [ -0.00]( 9.57) 0.99 [ 1.43]( 7.69)
512 1.00 [ -0.00]( 1.82) 1.02 [ -2.10]( 0.89)
==================================================================
Test : Various longer running benchmarks
Units : %diff in throughput reported
Interpretation: Higher is better
Statistic : Median
==================================================================
Benchmarks: %diff
ycsb-cassandra 0.07%
ycsb-mongodb -0.66%
deathstarbench-1x 0.36%
deathstarbench-2x 2.39%
deathstarbench-3x -0.09%
deathstarbench-6x 1.53%
hammerdb+mysql 16VU -0.27%
hammerdb+mysql 64VU -0.32%
---
I cannot make a hard case for this optimization. You can perhaps
share your iperf numbers if you are seeing significant
improvements there.
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists