[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3298a289-22c4-42c4-a7c1-39d9519f6223@amd.com>
Date: Tue, 13 Jan 2026 12:15:46 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>, <mingo@...nel.org>,
<peterz@...radead.org>, <vincent.guittot@...aro.org>,
<linux-kernel@...r.kernel.org>
CC: <juri.lelli@...hat.com>, <vschneid@...hat.com>, <tglx@...nel.org>,
<dietmar.eggemann@....com>, <anna-maria@...utronix.de>,
<frederic@...nel.org>, <wangyang.guo@...el.com>
Subject: Re: [PATCH v4 0/3] sched/fair: Improve nohz fields for large systems
Hello Shrikanth,
On 1/12/2026 10:34 AM, Shrikanth Hegde wrote:
> Running on large systems nohz.nr_cpus cacheline was seen as contended.
> There is atomic inc/dec and read happening on many
> CPUs at a time and it is possible for this line to bounce often.
>
> 1st and 2nd patch are minor ones. Looks like correct things to do.
> Not very important ones.
>
> 3rd patch: Main patch which is to get rid of nr_cpus.Instead, use the cpumask
> which is always updated alongside with it. Functionally it should serve
> the same purpose. Rest of the fields aren't updated that often. So this
> line shouldn't bounce that often.
>
> Contention issue with nohz.idle_cpus_mask still remains. Mostly it is in
> separate cacheline than nohz. There are ongoing efforts to mitigate it. It
> is not addressed by this series.
>
> v3 -> v4:
> - Added to changelog on one less cacheline being dirtied on idle
> entry/exit (Valentin Schneider)
I tested the v3 over the weekend and didn't spot any regressions
(at least none that I can reproduce consistently) so feel free to
include:
Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@....com>
If anyone is curious, following are results from my setup
(3rd Generation EPYC, 2 socket x 64/128T, boost on, C2 disabled):
Note: tbench hit some insane luck on higher utilization runs. I
haven't been able to reproduce those regressions reliably.
Most data points that show regression also have high run to run
variance on both tip and tip + patch making them unreliable.
==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) nohz_no_nr_cpus[pct imp](CV)
1-groups 1.00 [ -0.00]( 6.43) 1.04 [ -3.60](15.15)
2-groups 1.00 [ -0.00]( 5.42) 1.02 [ -2.17]( 3.57)
4-groups 1.00 [ -0.00]( 2.72) 0.99 [ 0.84]( 3.11)
8-groups 1.00 [ -0.00]( 3.65) 1.00 [ 0.31]( 2.50)
16-groups 1.00 [ -0.00]( 2.26) 1.02 [ -1.67]( 2.92)
==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) nohz_no_nr_cpus[pct imp](CV)
1 1.00 [ 0.00]( 0.40) 1.00 [ -0.25]( 1.22)
2 1.00 [ 0.00]( 1.33) 0.99 [ -0.57]( 0.37)
4 1.00 [ 0.00]( 0.27) 1.00 [ 0.07]( 0.89)
8 1.00 [ 0.00]( 0.53) 0.99 [ -0.83]( 0.32)
16 1.00 [ 0.00]( 1.39) 1.00 [ 0.11]( 1.92)
32 1.00 [ 0.00]( 1.85) 0.99 [ -1.44]( 3.08)
64 1.00 [ 0.00]( 1.55) 0.98 [ -2.17]( 2.51)
128 1.00 [ 0.00]( 1.05) 0.94 [ -6.11]( 0.28)
256 1.00 [ 0.00]( 0.68) 0.94 [ -5.58]( 3.77)
512 1.00 [ 0.00]( 0.30) 0.95 [ -4.91]( 0.22)
1024 1.00 [ 0.00]( 0.19) 0.95 [ -4.86]( 0.21)
==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) nohz_no_nr_cpus[pct imp](CV)
Copy 1.00 [ 0.00]( 8.08) 1.03 [ 2.91]( 4.84)
Scale 1.00 [ 0.00]( 5.43) 1.04 [ 3.56]( 3.32)
Add 1.00 [ 0.00]( 5.96) 1.04 [ 4.10]( 2.96)
Triad 1.00 [ 0.00]( 6.36) 0.99 [ -1.23]( 5.83)
==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) nohz_no_nr_cpus[pct imp](CV)
Copy 1.00 [ 0.00]( 3.78) 1.03 [ 3.17]( 1.90)
Scale 1.00 [ 0.00]( 4.17) 1.02 [ 1.79]( 0.91)
Add 1.00 [ 0.00]( 1.97) 1.01 [ 0.52]( 1.66)
Triad 1.00 [ 0.00]( 2.28) 0.99 [ -1.49]( 4.44)
==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) nohz_no_nr_cpus[pct imp](CV)
1 1.00 [ -0.00](33.02) 1.21 [-20.59]( 8.06)
2 1.00 [ -0.00](14.30) 1.14 [-14.29]( 6.45)
4 1.00 [ -0.00]( 2.22) 0.98 [ 2.22]( 4.55)
8 1.00 [ -0.00]( 4.63) 0.94 [ 5.56]( 1.96)
16 1.00 [ -0.00]( 1.67) 1.07 [ -6.67]( 1.82)
32 1.00 [ -0.00]( 5.58) 0.99 [ 1.04]( 2.11)
64 1.00 [ -0.00]( 6.03) 0.99 [ 0.52]( 5.25)
128 1.00 [ -0.00]( 7.09) 1.00 [ -0.49]( 5.11)
256 1.00 [ -0.00]( 3.14) 0.94 [ 6.06](13.53)
512 1.00 [ -0.00]( 0.86) 0.98 [ 2.23]( 1.53)
==================================================================
Test : new-schbench-requests-per-second
Units : Normalized Requests per second
Interpretation: Higher is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) nohz_no_nr_cpus[pct imp](CV)
1 1.00 [ 0.00]( 0.14) 1.00 [ 0.00]( 0.52)
2 1.00 [ 0.00]( 0.14) 1.00 [ 0.28]( 0.00)
4 1.00 [ 0.00]( 0.14) 1.00 [ 0.00]( 0.00)
8 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.14)
16 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.00)
32 1.00 [ 0.00]( 5.05) 0.97 [ -3.11]( 1.91)
64 1.00 [ 0.00](10.41) 1.06 [ 5.60]( 3.79)
128 1.00 [ 0.00]( 0.30) 0.98 [ -2.38]( 0.31)
256 1.00 [ 0.00]( 1.43) 0.98 [ -1.73]( 1.38)
512 1.00 [ 0.00]( 1.45) 0.97 [ -3.33]( 1.48)
==================================================================
Test : new-schbench-wakeup-latency
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) nohz_no_nr_cpus[pct imp](CV)
1 1.00 [ -0.00](24.99) 1.08 [ -8.33](16.90)
2 1.00 [ -0.00]( 0.00) 1.40 [-40.00](18.20)
4 1.00 [ -0.00](12.06) 1.27 [-27.27]( 7.75)
8 1.00 [ -0.00](14.13) 0.90 [ 10.00](23.66)
16 1.00 [ -0.00](15.96) 1.09 [ -9.09]( 7.45)
32 1.00 [ -0.00](12.06) 0.91 [ 9.09](18.23)
64 1.00 [ -0.00](15.78) 1.06 [ -6.25](13.18)
128 1.00 [ -0.00](10.57) 1.03 [ -3.41]( 5.15)
256 1.00 [ -0.00]( 0.32) 1.00 [ -0.00]( 0.21)
512 1.00 [ -0.00]( 0.00) 1.00 [ 0.38]( 0.20)
==================================================================
Test : new-schbench-request-latency
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) nohz_no_nr_cpus[pct imp](CV)
1 1.00 [ -0.00]( 0.00) 1.00 [ -0.27]( 1.79)
2 1.00 [ -0.00]( 0.74) 0.96 [ 4.07]( 1.90)
4 1.00 [ -0.00]( 0.37) 0.96 [ 3.83]( 1.91)
8 1.00 [ -0.00]( 1.02) 1.00 [ -0.28]( 1.52)
16 1.00 [ -0.00]( 1.61) 1.00 [ 0.28]( 1.86)
32 1.00 [ -0.00]( 9.22) 1.04 [ -3.52]( 6.84)
64 1.00 [ -0.00]( 6.39) 1.06 [ -5.96](22.58)
128 1.00 [ -0.00]( 1.08) 1.12 [-12.43]( 4.61)
256 1.00 [ -0.00]( 6.10) 1.01 [ -0.77]( 4.87)
512 1.00 [ -0.00]( 1.41) 1.01 [ -1.03]( 1.27)
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists