linux-kernel - Re: [PATCH 4/4] sched/fair: Proportional newidle balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <396c640b-81bb-4be4-860d-7ab3ff667795@amd.com>
Date: Wed, 28 Jan 2026 09:38:31 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Mario Roy <marioeroy@...il.com>, Chris Mason <clm@...a.com>, "Joseph
 Salisbury" <joseph.salisbury@...cle.com>, Adam Li
	<adamli@...amperecomputing.com>, Hazem Mohamed Abuelfotoh
	<abuehaze@...zon.com>, Josh Don <joshdon@...gle.com>, <mingo@...hat.com>,
	<juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
	<dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
	<mgorman@...e.de>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 4/4] sched/fair: Proportional newidle balance

On 1/23/2026 5:54 PM, K Prateek Nayak wrote:
> Hello Peter,
> 
> On 1/23/2026 4:33 PM, Peter Zijlstra wrote:
>> On Fri, Jan 23, 2026 at 11:50:46AM +0100, Peter Zijlstra wrote:
>>> On Sun, Jan 18, 2026 at 03:46:22PM -0500, Mario Roy wrote:
>>>> The patch "Proportional newidle balance" introduced a regression
>>>> with Linux 6.12.65 and 6.18.5. There is noticeable regression with
>>>> easyWave testing. [1]
>>>>
>>>> The CPU is AMD Threadripper 9960X CPU (24/48). I followed the source
>>>> to install easyWave [2]. That is fetching the two tar.gz archives.
>>>
>>> What is the actual configuration of that chip? Is it like 3*8 or 4*6
>>> (CCX wise). A quick google couldn't find me the answer :/
>>
>> Obviously I found it right after sending this. It's a 4x6 config.
>> Meaning it needs newidle to balance between those 4 domains.
>>
>> Pratheek -- are you guys still considering that SIS_NODE thing? That
>> worked really well for workstation chips, but there were some issues on
>> Epyc or so.
> 
> SIS_NODE was really turned out to be a trade-off between search
> time vs search opportunity, especially when the system was heavily
> overloaded.
> 
> Let me rebase those old patches and give it a spin over the weekend
> on a couple of those large machines (128C/256T and 192C/384T per
> socket) to see the damage. I'll update here by Tuesday or post out
> a series if I see the situation having changed on the recent
> kernels - some benchmarks had a completely different bottleneck
> there when we looked closer last.

So these are the results on tip:sched/core merged onto tip:sched/urgent
with SIS_NODE and SIS_NODE + SIS_UTIL [1] on a 512 CPUs machine with
(2 sockets x 16 CCXs (LLCs) x 8C/16T Zen4c cores):

tl;dr

(*) Consistent regressions, even with SIS_UTIL bailout on higher domain;
    Benchmark are mainly measuring tail-latency or have a thundering
    heard behavior that SIS_UTIL uwith default imbalance_pct isn't able
    to fully adjust to.

(#) Data has run-to-run variance but is still worse on average.

Note: Although "new-schbench-wakeup-latency" shows regression, the
baseline is few "us" and a couple more "us" addition appears as a
~ 20%-30% regression.

I'm still fighting dependency hell to get some of the longer running
benchmarks running on this system but I expect a few pct regressions
like last time [2].

System:

- 2 x 128C/256T Zen4c system with 16CCXs per socket
- Boost on
- C2 disabled
- Each socket is a NUMA node

Kernels:

tip: tip:sched/core at commit 377521af0341 ("sched: remove
     task_struct->faults_disabled_mapping") merged onto
     tip:sched/urgent at commit 15257cc2f905 ("sched/fair: Revert
     force wakeup preemption")

sis_node: tip + sis_node patch + cpumask_and() moved to after
          SIS_UTIL bailout [3]

sis_node: Tree from [1] based on tip:sched/core merged onto
          tip:sched/urgent

Full results:

  ==================================================================
  Test          : hackbench
  Units         : Normalized time in seconds
  Interpretation: Lower is better
  Statistic     : AMean
  ==================================================================
  Case:           tip[pct imp](CV)       sis-node[pct imp](CV)    sis-node-w-sis-util[pct imp](CV)
   1-groups     1.00 [ -0.00](11.61)     0.76 [ 24.30]( 4.43)     0.76 [ 24.05]( 2.93)
   2-groups     1.00 [ -0.00]( 9.73)     0.86 [ 14.22](17.59)     0.80 [ 19.85](15.31)
   4-groups     1.00 [ -0.00]( 5.88)     0.78 [ 21.87](11.93)     0.78 [ 21.64](14.33)
   8-groups     1.00 [ -0.00]( 2.93)     0.92 [  8.44]( 3.99)     0.92 [  7.79]( 4.04)
  16-groups     1.00 [ -0.00]( 1.77)     0.90 [ 10.47]( 5.61)     0.94 [  5.92]( 5.65)


  ==================================================================
  Test          : tbench
  Units         : Normalized throughput
  Interpretation: Higher is better
  Statistic     : AMean
  ==================================================================
  Clients:    tip[pct imp](CV)       sis-node[pct imp](CV)    sis-node-w-sis-util[pct imp](CV)
      1     1.00 [  0.00]( 0.20)     1.00 [ -0.07]( 0.16)     1.01 [  0.53]( 0.23)
      2     1.00 [  0.00]( 0.35)     1.00 [ -0.03]( 0.58)     1.00 [  0.12]( 0.20)
      4     1.00 [  0.00]( 0.09)     1.01 [  0.60]( 0.60)     1.00 [  0.16]( 0.15)
      8     1.00 [  0.00]( 0.63)     1.00 [ -0.35]( 0.53)     1.00 [  0.26]( 0.19)
     16     1.00 [  0.00]( 0.97)     1.00 [  0.33]( 0.30)     1.01 [  1.16]( 0.50)
     32     1.00 [  0.00]( 0.98)     1.02 [  1.54]( 0.91)     1.01 [  1.10]( 0.26)
     64     1.00 [  0.00]( 3.45)     1.02 [  1.88]( 0.48)     1.02 [  1.78]( 1.29)
    128     1.00 [  0.00]( 2.49)     1.00 [ -0.01]( 1.63)     0.99 [ -0.68]( 1.88)
    256     1.00 [  0.00]( 0.57)     1.01 [  0.73]( 0.45)     1.01 [  0.92]( 0.35)
    512     1.00 [  0.00]( 3.92)     0.51 [-48.55]( 0.11)     0.80 [-19.59]( 6.31)	(*)
   1024     1.00 [  0.00]( 0.10)     0.98 [ -2.11]( 0.09)     0.97 [ -3.29]( 0.28)
   2048     1.00 [  0.00]( 0.09)     0.98 [ -2.08]( 0.28)     0.99 [ -0.75]( 0.48)


  ==================================================================
  Test          : stream-10
  Units         : Normalized Bandwidth, MB/s
  Interpretation: Higher is better
  Statistic     : HMean
  ==================================================================
  Test:       tip[pct imp](CV)       sis-node[pct imp](CV)    sis-node-w-sis-util[pct imp](CV)
   Copy     1.00 [  0.00]( 0.31)     0.99 [ -0.70]( 0.57)     1.00 [ -0.09]( 1.44)
  Scale     1.00 [  0.00]( 0.38)     0.99 [ -1.00]( 0.49)     1.00 [  0.32]( 1.41)
    Add     1.00 [  0.00]( 0.31)     0.99 [ -0.95]( 0.63)     1.00 [  0.43]( 1.16)
  Triad     1.00 [  0.00]( 0.18)     0.99 [ -0.84]( 0.68)     1.00 [  0.16]( 1.12)


  ==================================================================
  Test          : stream-100
  Units         : Normalized Bandwidth, MB/s
  Interpretation: Higher is better
  Statistic     : HMean
  ==================================================================
  Test:       tip[pct imp](CV)       sis-node[pct imp](CV)    sis-node-w-sis-util[pct imp](CV)
   Copy     1.00 [  0.00]( 1.46)     1.00 [  0.39]( 1.57)     1.01 [  0.82]( 0.52)
  Scale     1.00 [  0.00]( 1.45)     1.00 [  0.49]( 1.37)     1.01 [  1.20]( 0.55)
    Add     1.00 [  0.00]( 1.09)     1.00 [  0.31]( 0.94)     1.01 [  0.79]( 0.35)
  Triad     1.00 [  0.00]( 1.06)     1.00 [  0.22]( 1.02)     1.01 [  0.56]( 0.19)


  ==================================================================
  Test          : netperf
  Units         : Normalized Througput
  Interpretation: Higher is better
  Statistic     : AMean
  ==================================================================
  Clients:         tip[pct imp](CV)       sis-node[pct imp](CV)    sis-node-w-sis-util[pct imp](CV)
   1-clients     1.00 [  0.00]( 0.27)     0.99 [ -0.82]( 0.26)     0.99 [ -0.78]( 0.16)
   2-clients     1.00 [  0.00]( 0.28)     0.99 [ -0.87]( 0.19)     1.00 [ -0.17]( 0.67)
   4-clients     1.00 [  0.00]( 0.38)     1.00 [ -0.47]( 0.33)     0.99 [ -0.53]( 0.31)
   8-clients     1.00 [  0.00]( 0.34)     0.99 [ -0.55]( 0.18)     1.00 [ -0.33]( 0.24)
  16-clients     1.00 [  0.00]( 0.30)     1.00 [ -0.39]( 0.23)     1.00 [ -0.19]( 0.26)
  32-clients     1.00 [  0.00]( 0.43)     1.00 [ -0.40]( 0.57)     1.00 [ -0.24]( 0.68)
  64-clients     1.00 [  0.00]( 0.82)     1.00 [ -0.12]( 0.45)     1.00 [ -0.14]( 0.70)
  128-clients    1.00 [  0.00]( 1.21)     1.00 [  0.10]( 1.28)     1.00 [  0.08]( 1.19)
  256-clients    1.00 [  0.00]( 1.38)     1.01 [  0.65]( 0.89)     1.00 [  0.34]( 0.89)
  512-clients    1.00 [  0.00]( 8.76)     0.47 [-52.76]( 1.64)     0.77 [-23.10](10.06)	(*)
  768-clients    1.00 [  0.00](34.29)     0.83 [-16.89](30.45)     0.98 [ -2.16](36.19)
  1024-clients   1.00 [  0.00](47.96)     0.91 [ -9.29](36.02)     0.98 [ -1.93](46.36)


  ==================================================================
  Test          : schbench
  Units         : Normalized 99th percentile latency in us
  Interpretation: Lower is better
  Statistic     : Median
  ==================================================================
  #workers: tip[pct imp](CV)       sis-node[pct imp](CV)    sis-node-w-sis-util[pct imp](CV)
    1     1.00 [ -0.00](14.20)     1.72 [-72.00](15.01)     0.88 [ 12.00]( 4.55)
    2     1.00 [ -0.00]( 1.68)     1.09 [ -8.82]( 6.96)     0.97 [  2.94]( 9.90)
    4     1.00 [ -0.00]( 4.45)     1.18 [-17.65]( 5.29)     1.03 [ -2.94]( 3.24)
    8     1.00 [ -0.00]( 2.44)     1.12 [-12.20]( 4.35)     1.02 [ -2.44]( 2.38)
   16     1.00 [ -0.00]( 0.00)     1.04 [ -3.64]( 1.75)     0.98 [  1.82]( 1.85)
   32     1.00 [ -0.00]( 2.87)     1.03 [ -2.53]( 2.80)     0.99 [  1.27]( 1.47)
   64     1.00 [ -0.00]( 3.17)     1.02 [ -1.57]( 5.72)     0.98 [  2.36]( 2.30)
  128     1.00 [ -0.00]( 2.95)     1.01 [ -1.35]( 3.03)     1.00 [ -0.00]( 1.13)
  256     1.00 [ -0.00]( 1.17)     0.99 [  1.23]( 1.75)     0.99 [  1.43]( 1.56)
  512     1.00 [ -0.00]( 4.54)     1.14 [-13.60]( 2.41)     0.97 [  2.50]( 0.42)
  768     1.00 [ -0.00]( 2.24)     1.27 [-27.44]( 3.18)     1.12 [-11.54]( 5.64)	(*)
  1024    1.00 [ -0.00]( 0.28)     1.14 [-14.20]( 0.56)     1.13 [-13.00]( 1.01)	(*)


  ==================================================================
  Test          : new-schbench-requests-per-second
  Units         : Normalized Requests per second
  Interpretation: Higher is better
  Statistic     : Median
  ==================================================================
  #workers: tip[pct imp](CV)       sis-node[pct imp](CV)    sis-node-w-sis-util[pct imp](CV)
    1     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.15)
    2     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.15)
    4     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.29]( 0.15)
    8     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.29]( 0.00)
   16     1.00 [  0.00]( 0.15)     1.00 [ -0.29]( 0.15)     1.00 [  0.00]( 0.00)
   32     1.00 [  0.00]( 0.15)     1.00 [ -0.29]( 0.00)     1.00 [  0.00]( 0.15)
   64     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.29]( 0.00)
  128     1.00 [  0.00]( 0.27)     1.00 [  0.00](18.48)     0.65 [-34.50](24.12)	(#)
  256     1.00 [  0.00]( 0.00)     0.99 [ -0.58]( 0.00)     0.99 [ -0.58]( 0.00)
  512     1.00 [  0.00]( 1.05)     1.00 [  0.00]( 0.20)     1.00 [  0.39]( 0.87)
  768     1.00 [  0.00]( 0.95)     0.98 [ -1.88]( 0.93)     0.99 [ -0.71]( 0.53)
  1024    1.00 [  0.00]( 0.49)     0.99 [ -0.81]( 0.57)     1.00 [  0.00]( 0.74)


  ==================================================================
  Test          : new-schbench-wakeup-latency
  Units         : Normalized 99th percentile latency in us
  Interpretation: Lower is better
  Statistic     : Median
  ==================================================================
  #workers: tip[pct imp](CV)       sis-node[pct imp](CV)    sis-node-w-sis-util[pct imp](CV)
    1     1.00 [ -0.00]( 6.74)     2.38 [-137.50](29.34)    1.75 [-75.00]( 9.53)
    2     1.00 [ -0.00](12.06)     1.27 [-27.27]( 9.53)     1.36 [-36.36]( 6.59)
    4     1.00 [ -0.00](11.71)     1.33 [-33.33]( 3.30)     1.33 [-33.33]( 3.16)
    8     1.00 [ -0.00]( 0.00)     1.27 [-27.27](12.69)     1.09 [ -9.09]( 4.43)
   16     1.00 [ -0.00]( 4.84)     1.09 [ -9.09]( 4.43)     1.18 [-18.18](10.79)
   32     1.00 [ -0.00]( 0.00)     1.00 [ -0.00]( 0.00)     1.10 [-10.00]( 4.56)
   64     1.00 [ -0.00](13.22)     1.00 [ -0.00]( 5.00)     1.00 [ -0.00]( 9.68)
  128     1.00 [ -0.00]( 8.13)     1.00 [ -0.00]( 8.85)     1.18 [-18.18](13.76)
  256     1.00 [ -0.00]( 2.97)     1.02 [ -1.94]( 3.80)     1.08 [ -7.77]( 7.13)
  512     1.00 [ -0.00]( 1.25)     1.00 [  0.37]( 0.68)     1.00 [ -0.37]( 1.81)
  768     1.00 [ -0.00]( 0.00)     1.00 [ -0.00]( 0.00)     1.00 [ -0.00]( 0.00)
  1024    1.00 [ -0.00]( 0.63)     1.00 [ -0.11]( 4.06)     1.00 [ -0.11]( 3.13)


  ==================================================================
  Test          : new-schbench-request-latency
  Units         : Normalized 99th percentile latency in us
  Interpretation: Lower is better
  Statistic     : Median
  ==================================================================
  #workers: tip[pct imp](CV)       sis-node[pct imp](CV)    sis-node-w-sis-util[pct imp](CV)
    1     1.00 [ -0.00]( 0.14)     1.00 [ -0.26]( 0.14)     1.00 [ -0.00]( 0.14)
    2     1.00 [ -0.00]( 0.14)     1.00 [ -0.26]( 0.00)     1.00 [ -0.00]( 0.14)
    4     1.00 [ -0.00]( 0.00)     1.00 [ -0.00]( 0.00)     1.00 [  0.26]( 0.14)
    8     1.00 [ -0.00]( 0.00)     1.00 [ -0.00]( 0.00)     1.00 [  0.26]( 0.14)
   16     1.00 [ -0.00]( 0.00)     1.00 [ -0.00]( 0.00)     1.01 [ -0.53]( 1.18)
   32     1.00 [ -0.00]( 0.54)     1.01 [ -1.05]( 0.59)     0.99 [  0.53]( 0.27)
   64     1.00 [ -0.00]( 0.00)     1.00 [  0.26]( 1.08)     1.00 [  0.26](31.75)
  128     1.00 [ -0.00]( 0.61)     1.00 [ -0.00]( 4.19)     1.10 [-10.22]( 4.79)	(#)
  256     1.00 [ -0.00]( 0.43)     1.01 [ -1.39]( 0.74)     1.02 [ -1.63]( 0.66)
  512     1.00 [ -0.00]( 3.32)     1.00 [  0.23]( 1.62)     1.04 [ -3.72]( 3.79)
  768     1.00 [ -0.00]( 0.88)     0.95 [  4.52]( 0.63)     0.98 [  1.94]( 0.54)
  1024    1.00 [ -0.00]( 1.01)     0.98 [  1.54]( 0.91)     1.00 [  0.17]( 0.31)


Let me go play around with imbalance_pct for SIS_UITL at PKG/NODE domain
to see if there is a sweet spot that keeps everything happy while things
are happier on average.

I doubt if Meta's workload will be happy with more aggressive SIS_UTIL
limits since data from David's SHARED_RUNQ series [4] showed that
specific workload requires aggressive search + aggressive newidle balance.

References:

[1] https://github.com/kudureranganath/linux/commits/kudure/sched/sis_node/
[2] https://lore.kernel.org/all/3de5c24f-6437-f21b-ed61-76b86a199e8c@amd.com/
[3] https://github.com/kudureranganath/linux/commit/7639cf7632853b91e6a5b449eee08d3399b10d31
[4] https://lore.kernel.org/lkml/20230809221218.163894-1-void@manifault.com/

-- 
Thanks and Regards,
Prateek