linux-kernel - Re: [PATCH v4 0/3] sched/fair: Improve nohz fields for large systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3298a289-22c4-42c4-a7c1-39d9519f6223@amd.com>
Date: Tue, 13 Jan 2026 12:15:46 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>, <mingo@...nel.org>,
	<peterz@...radead.org>, <vincent.guittot@...aro.org>,
	<linux-kernel@...r.kernel.org>
CC: <juri.lelli@...hat.com>, <vschneid@...hat.com>, <tglx@...nel.org>,
	<dietmar.eggemann@....com>, <anna-maria@...utronix.de>,
	<frederic@...nel.org>, <wangyang.guo@...el.com>
Subject: Re: [PATCH v4 0/3] sched/fair: Improve nohz fields for large systems

Hello Shrikanth,

On 1/12/2026 10:34 AM, Shrikanth Hegde wrote:
> Running on large systems nohz.nr_cpus cacheline was seen as contended.
> There is atomic inc/dec and read happening on many
> CPUs at a time and it is possible for this line to bounce often.
> 
> 1st and 2nd patch are minor ones. Looks like correct things to do.
> Not very important ones.
> 
> 3rd patch: Main patch which is to get rid of nr_cpus.Instead, use the cpumask
> which is always updated alongside with it. Functionally it should serve
> the same purpose. Rest of the fields aren't updated that often. So this
> line shouldn't bounce that often.
> 
> Contention issue with nohz.idle_cpus_mask still remains. Mostly it is in
> separate cacheline than nohz. There are ongoing efforts to mitigate it. It
> is not addressed by this series.
> 
> v3 -> v4:
> - Added to changelog on one less cacheline being dirtied on idle
>   entry/exit (Valentin Schneider)

I tested the v3 over the weekend and didn't spot any regressions
(at least none that I can reproduce consistently) so feel free to
include:

Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@....com>

If anyone is curious, following are results from my setup
(3rd Generation EPYC, 2 socket x 64/128T, boost on, C2 disabled):

Note: tbench hit some insane luck on higher utilization runs. I
haven't been able to reproduce those regressions reliably.
Most data points that show regression also have high run to run
variance on both tip and tip + patch making them unreliable.

  ==================================================================
  Test          : hackbench
  Units         : Normalized time in seconds
  Interpretation: Lower is better
  Statistic     : AMean
  ==================================================================
  Case:           tip[pct imp](CV)    nohz_no_nr_cpus[pct imp](CV)
   1-groups     1.00 [ -0.00]( 6.43)     1.04 [ -3.60](15.15)
   2-groups     1.00 [ -0.00]( 5.42)     1.02 [ -2.17]( 3.57)
   4-groups     1.00 [ -0.00]( 2.72)     0.99 [  0.84]( 3.11)
   8-groups     1.00 [ -0.00]( 3.65)     1.00 [  0.31]( 2.50)
  16-groups     1.00 [ -0.00]( 2.26)     1.02 [ -1.67]( 2.92)
  
  
  ==================================================================
  Test          : tbench
  Units         : Normalized throughput
  Interpretation: Higher is better
  Statistic     : AMean
  ==================================================================
  Clients:    tip[pct imp](CV)    nohz_no_nr_cpus[pct imp](CV)
      1     1.00 [  0.00]( 0.40)     1.00 [ -0.25]( 1.22)
      2     1.00 [  0.00]( 1.33)     0.99 [ -0.57]( 0.37)
      4     1.00 [  0.00]( 0.27)     1.00 [  0.07]( 0.89)
      8     1.00 [  0.00]( 0.53)     0.99 [ -0.83]( 0.32)
     16     1.00 [  0.00]( 1.39)     1.00 [  0.11]( 1.92)
     32     1.00 [  0.00]( 1.85)     0.99 [ -1.44]( 3.08)
     64     1.00 [  0.00]( 1.55)     0.98 [ -2.17]( 2.51)
    128     1.00 [  0.00]( 1.05)     0.94 [ -6.11]( 0.28)
    256     1.00 [  0.00]( 0.68)     0.94 [ -5.58]( 3.77)
    512     1.00 [  0.00]( 0.30)     0.95 [ -4.91]( 0.22)
   1024     1.00 [  0.00]( 0.19)     0.95 [ -4.86]( 0.21)
  
  
  ==================================================================
  Test          : stream-10
  Units         : Normalized Bandwidth, MB/s
  Interpretation: Higher is better
  Statistic     : HMean
  ==================================================================
  Test:       tip[pct imp](CV)    nohz_no_nr_cpus[pct imp](CV)
   Copy     1.00 [  0.00]( 8.08)     1.03 [  2.91]( 4.84)
  Scale     1.00 [  0.00]( 5.43)     1.04 [  3.56]( 3.32)
    Add     1.00 [  0.00]( 5.96)     1.04 [  4.10]( 2.96)
  Triad     1.00 [  0.00]( 6.36)     0.99 [ -1.23]( 5.83)
  
  
  ==================================================================
  Test          : stream-100
  Units         : Normalized Bandwidth, MB/s
  Interpretation: Higher is better
  Statistic     : HMean
  ==================================================================
  Test:       tip[pct imp](CV)    nohz_no_nr_cpus[pct imp](CV)
   Copy     1.00 [  0.00]( 3.78)     1.03 [  3.17]( 1.90)
  Scale     1.00 [  0.00]( 4.17)     1.02 [  1.79]( 0.91)
    Add     1.00 [  0.00]( 1.97)     1.01 [  0.52]( 1.66)
  Triad     1.00 [  0.00]( 2.28)     0.99 [ -1.49]( 4.44)
  
  
  ==================================================================
  Test          : schbench
  Units         : Normalized 99th percentile latency in us
  Interpretation: Lower is better
  Statistic     : Median
  ==================================================================
  #workers: tip[pct imp](CV)    nohz_no_nr_cpus[pct imp](CV)
    1     1.00 [ -0.00](33.02)     1.21 [-20.59]( 8.06)
    2     1.00 [ -0.00](14.30)     1.14 [-14.29]( 6.45)
    4     1.00 [ -0.00]( 2.22)     0.98 [  2.22]( 4.55)
    8     1.00 [ -0.00]( 4.63)     0.94 [  5.56]( 1.96)
   16     1.00 [ -0.00]( 1.67)     1.07 [ -6.67]( 1.82)
   32     1.00 [ -0.00]( 5.58)     0.99 [  1.04]( 2.11)
   64     1.00 [ -0.00]( 6.03)     0.99 [  0.52]( 5.25)
  128     1.00 [ -0.00]( 7.09)     1.00 [ -0.49]( 5.11)
  256     1.00 [ -0.00]( 3.14)     0.94 [  6.06](13.53)
  512     1.00 [ -0.00]( 0.86)     0.98 [  2.23]( 1.53)
  
  
  ==================================================================
  Test          : new-schbench-requests-per-second
  Units         : Normalized Requests per second
  Interpretation: Higher is better
  Statistic     : Median
  ==================================================================
  #workers: tip[pct imp](CV)    nohz_no_nr_cpus[pct imp](CV)
    1     1.00 [  0.00]( 0.14)     1.00 [  0.00]( 0.52)
    2     1.00 [  0.00]( 0.14)     1.00 [  0.28]( 0.00)
    4     1.00 [  0.00]( 0.14)     1.00 [  0.00]( 0.00)
    8     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.14)
   16     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)
   32     1.00 [  0.00]( 5.05)     0.97 [ -3.11]( 1.91)
   64     1.00 [  0.00](10.41)     1.06 [  5.60]( 3.79)
  128     1.00 [  0.00]( 0.30)     0.98 [ -2.38]( 0.31)
  256     1.00 [  0.00]( 1.43)     0.98 [ -1.73]( 1.38)
  512     1.00 [  0.00]( 1.45)     0.97 [ -3.33]( 1.48)
  
  
  ==================================================================
  Test          : new-schbench-wakeup-latency
  Units         : Normalized 99th percentile latency in us
  Interpretation: Lower is better
  Statistic     : Median
  ==================================================================
  #workers: tip[pct imp](CV)    nohz_no_nr_cpus[pct imp](CV)
    1     1.00 [ -0.00](24.99)     1.08 [ -8.33](16.90)
    2     1.00 [ -0.00]( 0.00)     1.40 [-40.00](18.20)
    4     1.00 [ -0.00](12.06)     1.27 [-27.27]( 7.75)
    8     1.00 [ -0.00](14.13)     0.90 [ 10.00](23.66)
   16     1.00 [ -0.00](15.96)     1.09 [ -9.09]( 7.45)
   32     1.00 [ -0.00](12.06)     0.91 [  9.09](18.23)
   64     1.00 [ -0.00](15.78)     1.06 [ -6.25](13.18)
  128     1.00 [ -0.00](10.57)     1.03 [ -3.41]( 5.15)
  256     1.00 [ -0.00]( 0.32)     1.00 [ -0.00]( 0.21)
  512     1.00 [ -0.00]( 0.00)     1.00 [  0.38]( 0.20)
  
  
  ==================================================================
  Test          : new-schbench-request-latency
  Units         : Normalized 99th percentile latency in us
  Interpretation: Lower is better
  Statistic     : Median
  ==================================================================
  #workers: tip[pct imp](CV)    nohz_no_nr_cpus[pct imp](CV)
    1     1.00 [ -0.00]( 0.00)     1.00 [ -0.27]( 1.79)
    2     1.00 [ -0.00]( 0.74)     0.96 [  4.07]( 1.90)
    4     1.00 [ -0.00]( 0.37)     0.96 [  3.83]( 1.91)
    8     1.00 [ -0.00]( 1.02)     1.00 [ -0.28]( 1.52)
   16     1.00 [ -0.00]( 1.61)     1.00 [  0.28]( 1.86)
   32     1.00 [ -0.00]( 9.22)     1.04 [ -3.52]( 6.84)
   64     1.00 [ -0.00]( 6.39)     1.06 [ -5.96](22.58)
  128     1.00 [ -0.00]( 1.08)     1.12 [-12.43]( 4.61)
  256     1.00 [ -0.00]( 6.10)     1.01 [ -0.77]( 4.87)
  512     1.00 [ -0.00]( 1.41)     1.01 [ -1.03]( 1.27) 

-- 
Thanks and Regards,
Prateek