[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<PUZPR04MB492296C8301DDA9654D7970CE37DA@PUZPR04MB4922.apcprd04.prod.outlook.com>
Date: Thu, 19 Jun 2025 14:08:42 +0800
From: Jianyong Wu <jianyong.wu@...look.com>
To: K Prateek Nayak <kprateek.nayak@....com>,
Jianyong Wu <wujianyong@...on.cn>, mingo@...hat.com, peterz@...radead.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: allow imbalance between LLCs under NUMA
Hi Prateek,
Thank you for taking the time to test this patch.
This patch aims to reduce meaningless task migrations, such as those in
iperf tests, which having not considered performance so much. In my
iperf tests, there wasn't significant performance improvement observed.
(Notably, the number of task migrations decreased substantially.) Even
when I bound iperf tasks to the same LLC, performance metrics didn't
improve significantly. Therefore, this change is unlikely to enhance
iperf performance notably, indicating that task migration has minimal
effect on iperf tests.
IMO, we should allow at least two tasks per LLC to enable inter-task
communication. Theoretically, this could yield better performance, even
though I haven't found a valid scenario to support this yet.
If this change has bad effect for performance, is there any suggestion
to mitigate the iperf migration issue? Or just leave it there?
Any suggestions would be greatly appreciated.
Thanks
Jianyong
On 6/18/2025 2:37 PM, K Prateek Nayak wrote:
> Hello Jianyong,
>
> On 6/16/2025 7:52 AM, Jianyong Wu wrote:
>> Would you mind letting me know if you've had a chance to try it out,
>> or if there's any update on the progress?
>
> Here are my results from a dual socket 3rd Generation EPYC
> system.
>
> tl;dr I don't see any improvement and a few regressions too
> but few of those data points also have a lot of variance.
>
> o Machine details
>
> - 3rd Generation EPYC System
> - 2 sockets each with 64C/128T
> - NPS1 (Each socket is a NUMA node)
> - C2 Disabled (POLL and C1(MWAIT) remained enabled)
>
> o Kernel details
>
> tip: tip:sched/core at commit 914873bc7df9 ("Merge tag
> 'x86-build-2025-05-25' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
>
> allow_imb: tip + this series as is
>
> o Benchmark results
>
> ==================================================================
> Test : hackbench
> Units : Normalized time in seconds
> Interpretation: Lower is better
> Statistic : AMean
> ==================================================================
> Case: tip[pct imp](CV) allow_imb[pct imp](CV)
> 1-groups 1.00 [ -0.00](13.74) 1.03 [ -3.20]( 9.18)
> 2-groups 1.00 [ -0.00]( 9.58) 1.06 [ -6.46]( 7.63)
> 4-groups 1.00 [ -0.00]( 2.10) 1.01 [ -1.30]( 1.90)
> 8-groups 1.00 [ -0.00]( 1.51) 0.99 [ 1.42]( 0.91)
> 16-groups 1.00 [ -0.00]( 1.10) 0.99 [ 1.09]( 1.13)
>
>
> ==================================================================
> Test : tbench
> Units : Normalized throughput
> Interpretation: Higher is better
> Statistic : AMean
> ==================================================================
> Clients: tip[pct imp](CV) allow_imb[pct imp](CV)
> 1 1.00 [ 0.00]( 0.82) 1.01 [ 1.11]( 0.27)
> 2 1.00 [ 0.00]( 1.13) 1.00 [ -0.05]( 0.62)
> 4 1.00 [ 0.00]( 1.12) 1.02 [ 2.36]( 0.19)
> 8 1.00 [ 0.00]( 0.93) 1.01 [ 1.02]( 0.86)
> 16 1.00 [ 0.00]( 0.38) 1.01 [ 0.71]( 1.71)
> 32 1.00 [ 0.00]( 0.66) 1.01 [ 1.31]( 1.88)
> 64 1.00 [ 0.00]( 1.18) 0.98 [ -1.60]( 2.90)
> 128 1.00 [ 0.00]( 1.12) 1.02 [ 1.60]( 0.42)
> 256 1.00 [ 0.00]( 0.42) 1.00 [ 0.40]( 0.80)
> 512 1.00 [ 0.00]( 0.14) 1.01 [ 0.97]( 0.25)
> 1024 1.00 [ 0.00]( 0.26) 1.01 [ 1.29]( 0.19)
>
>
> ==================================================================
> Test : stream-10
> Units : Normalized Bandwidth, MB/s
> Interpretation: Higher is better
> Statistic : HMean
> ==================================================================
> Test: tip[pct imp](CV) allow_imb[pct imp](CV)
> Copy 1.00 [ 0.00]( 8.37) 1.01 [ 1.00]( 5.71)
> Scale 1.00 [ 0.00]( 2.85) 0.98 [ -1.94]( 5.23)
> Add 1.00 [ 0.00]( 3.39) 0.99 [ -1.39]( 4.77)
> Triad 1.00 [ 0.00]( 6.39) 1.05 [ 5.15]( 5.62)
>
>
> ==================================================================
> Test : stream-100
> Units : Normalized Bandwidth, MB/s
> Interpretation: Higher is better
> Statistic : HMean
> ==================================================================
> Test: tip[pct imp](CV) allow_imb[pct imp](CV)
> Copy 1.00 [ 0.00]( 3.91) 1.01 [ 1.28]( 2.01)
> Scale 1.00 [ 0.00]( 4.34) 0.99 [ -0.65]( 3.74)
> Add 1.00 [ 0.00]( 4.14) 1.01 [ 0.54]( 1.63)
> Triad 1.00 [ 0.00]( 1.00) 0.98 [ -2.28]( 4.89)
>
>
> ==================================================================
> Test : netperf
> Units : Normalized Througput
> Interpretation: Higher is better
> Statistic : AMean
> ==================================================================
> Clients: tip[pct imp](CV) allow_imb[pct imp](CV)
> 1-clients 1.00 [ 0.00]( 0.41) 1.01 [ 1.17]( 0.39)
> 2-clients 1.00 [ 0.00]( 0.58) 1.01 [ 1.00]( 0.40)
> 4-clients 1.00 [ 0.00]( 0.35) 1.01 [ 0.73]( 0.50)
> 8-clients 1.00 [ 0.00]( 0.48) 1.00 [ 0.42]( 0.67)
> 16-clients 1.00 [ 0.00]( 0.66) 1.01 [ 0.84]( 0.57)
> 32-clients 1.00 [ 0.00]( 1.15) 1.01 [ 0.82]( 0.96)
> 64-clients 1.00 [ 0.00]( 1.38) 1.00 [ -0.24]( 3.09)
> 128-clients 1.00 [ 0.00]( 0.87) 1.00 [ -0.16]( 1.02)
> 256-clients 1.00 [ 0.00]( 5.36) 1.01 [ 0.66]( 4.55)
> 512-clients 1.00 [ 0.00](54.39) 0.98 [ -1.59](57.35)
>
>
> ==================================================================
> Test : schbench
> Units : Normalized 99th percentile latency in us
> Interpretation: Lower is better
> Statistic : Median
> ==================================================================
> #workers: tip[pct imp](CV) allow_imb[pct imp](CV)
> 1 1.00 [ -0.00]( 8.54) 1.04 [ -4.35]( 3.69)
> 2 1.00 [ -0.00]( 1.15) 0.96 [ 4.00]( 0.00)
> 4 1.00 [ -0.00](13.46) 1.02 [ -2.08]( 2.04)
> 8 1.00 [ -0.00]( 7.14) 0.82 [ 17.54]( 9.30)
> 16 1.00 [ -0.00]( 3.49) 1.05 [ -5.08]( 7.83)
> 32 1.00 [ -0.00]( 1.06) 1.01 [ -1.06]( 5.88)
> 64 1.00 [ -0.00]( 5.48) 1.05 [ -4.65]( 2.71)
> 128 1.00 [ -0.00](10.45) 1.09 [ -9.11](14.18)
> 256 1.00 [ -0.00](31.14) 1.05 [ -5.15]( 9.79)
> 512 1.00 [ -0.00]( 1.52) 0.96 [ 4.30]( 0.26)
>
>
> ==================================================================
> Test : new-schbench-requests-per-second
> Units : Normalized Requests per second
> Interpretation: Higher is better
> Statistic : Median
> ==================================================================
> #workers: tip[pct imp](CV) allow_imb[pct imp](CV)
> 1 1.00 [ 0.00]( 1.07) 1.00 [ 0.29]( 0.61)
> 2 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.26)
> 4 1.00 [ 0.00]( 0.00) 1.00 [ -0.29]( 0.00)
> 8 1.00 [ 0.00]( 0.15) 1.00 [ 0.29]( 0.15)
> 16 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.00)
> 32 1.00 [ 0.00]( 3.41) 0.97 [ -2.86]( 2.91)
> 64 1.00 [ 0.00]( 1.05) 0.97 [ -3.17]( 7.39)
> 128 1.00 [ 0.00]( 0.00) 1.00 [ -0.38]( 0.39)
> 256 1.00 [ 0.00]( 0.72) 1.01 [ 0.61]( 0.96)
> 512 1.00 [ 0.00]( 0.57) 1.01 [ 0.72]( 0.21)
>
>
> ==================================================================
> Test : new-schbench-wakeup-latency
> Units : Normalized 99th percentile latency in us
> Interpretation: Lower is better
> Statistic : Median
> ==================================================================
> #workers: tip[pct imp](CV) allow_imb[pct imp](CV)
> 1 1.00 [ -0.00]( 9.11) 0.69 [ 31.25]( 8.13)
> 2 1.00 [ -0.00]( 0.00) 0.93 [ 7.14]( 8.37)
> 4 1.00 [ -0.00]( 3.78) 1.07 [ -7.14](14.79)
> 8 1.00 [ -0.00]( 0.00) 1.08 [ -8.33]( 7.56)
> 16 1.00 [ -0.00]( 7.56) 1.08 [ -7.69](34.36)
> 32 1.00 [ -0.00](15.11) 1.00 [ -0.00](12.99)
> 64 1.00 [ -0.00]( 9.63) 0.80 [ 20.00](11.17)
> 128 1.00 [ -0.00]( 4.86) 0.98 [ 2.01](13.01)
> 256 1.00 [ -0.00]( 2.34) 1.01 [ -1.00]( 3.51)
> 512 1.00 [ -0.00]( 0.40) 1.00 [ 0.38]( 0.20)
>
>
> ==================================================================
> Test : new-schbench-request-latency
> Units : Normalized 99th percentile latency in us
> Interpretation: Lower is better
> Statistic : Median
> ==================================================================
> #workers: tip[pct imp](CV) allow_imb[pct imp](CV)
> 1 1.00 [ -0.00]( 2.73) 0.98 [ 2.08]( 3.51)
> 2 1.00 [ -0.00]( 0.87) 0.99 [ 0.54]( 3.29)
> 4 1.00 [ -0.00]( 1.21) 1.06 [ -5.92]( 0.82)
> 8 1.00 [ -0.00]( 0.27) 1.03 [ -3.15]( 1.86)
> 16 1.00 [ -0.00]( 4.04) 1.00 [ -0.27]( 2.27)
> 32 1.00 [ -0.00]( 7.35) 1.30 [-30.45](20.57)
> 64 1.00 [ -0.00]( 3.54) 1.01 [ -0.67]( 0.82)
> 128 1.00 [ -0.00]( 0.37) 1.00 [ 0.21]( 0.18)
> 256 1.00 [ -0.00]( 9.57) 0.99 [ 1.43]( 7.69)
> 512 1.00 [ -0.00]( 1.82) 1.02 [ -2.10]( 0.89)
>
>
> ==================================================================
> Test : Various longer running benchmarks
> Units : %diff in throughput reported
> Interpretation: Higher is better
> Statistic : Median
> ==================================================================
> Benchmarks: %diff
> ycsb-cassandra 0.07%
> ycsb-mongodb -0.66%
>
> deathstarbench-1x 0.36%
> deathstarbench-2x 2.39%
> deathstarbench-3x -0.09%
> deathstarbench-6x 1.53%
>
> hammerdb+mysql 16VU -0.27%
> hammerdb+mysql 64VU -0.32%
>
> ---
>
> I cannot make a hard case for this optimization. You can perhaps
> share your iperf numbers if you are seeing significant
> improvements there.
>
Powered by blists - more mailing lists