linux-kernel - Re: [RFC][PATCH 0/5] sched: Try and address some recent-ish regressions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2d747fb5-81e7-48ac-ae51-db737a170b81@amd.com>
Date: Fri, 13 Jun 2025 08:58:56 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
 juri.lelli@...hat.com, vincent.guittot@...aro.org
Cc: linux-kernel@...r.kernel.org, dietmar.eggemann@....com,
 rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
 vschneid@...hat.com, clm@...a.com
Subject: Re: [RFC][PATCH 0/5] sched: Try and address some recent-ish
 regressions

Hello Peter,

On 6/2/2025 10:14 AM, K Prateek Nayak wrote:
> Hello Peter,
> 
> On 5/20/2025 3:15 PM, Peter Zijlstra wrote:
>> As can be seen, the SPR is much easier to please than the SKL for whatever
>> reason. I'm thinking we can make TTWU_QUEUE_DELAYED default on, but I suspect
>> TTWU_QUEUE_DEFAULT might be a harder sell -- we'd need to run more than this
>> one benchmark.
> 
> I haven't tried toggling any of the newly added SCHED_FEAT() yet.

Here are the full results:

tldr;

- schbench (old) has a consistent regression for 16, 32, 64,
   128, 256 workers (> CCX size, < Overloaded) except for with
   256 workers case with TTWU_QUEUE_DEFAULT which shows an
   improvement.

- new schebench has few regressions around 32, 64, and 128
   workers for wakeup and request latency.

- Most others benchmarks show minor improvements /
   regressions but nothing serious.
   

o Variants

"DELAYED" enables "TTWU_QUEUE_DELAYED" alone, "DEFAULT" enables
"TTWU_QUEUE_DEFAULT" alone, and "BOTH" variant enables both.
vanilla was shared previously which is same as out of box with no
changes made to the sched features.


o Benchmark numbers

     ==================================================================
     Test          : hackbench
     Units         : Normalized time in seconds
     Interpretation: Lower is better
     Statistic     : AMean
     ==================================================================
     Case:           tip[pct imp](CV)       vanilla[pct imp](CV)     DELAYED[pct imp](CV)     DEFUALT[pct imp](CV)      BOTH[pct imp](CV)
      1-groups     1.00 [ -0.00](13.74)     0.92 [  7.68]( 6.04)     0.95 [  5.12](10.12)     1.02 [ -1.92]( 6.70)     0.95 [  4.90]( 5.28)
      2-groups     1.00 [ -0.00]( 9.58)     1.04 [ -3.56]( 4.96)     1.03 [ -3.12]( 5.12)     0.98 [  1.56]( 4.30)     1.01 [ -1.11]( 5.78)
      4-groups     1.00 [ -0.00]( 2.10)     1.01 [ -1.30]( 2.27)     1.01 [ -1.09]( 2.68)     1.00 [ -0.43]( 2.58)     1.01 [ -0.65]( 1.38)
      8-groups     1.00 [ -0.00]( 1.51)     0.99 [  1.26]( 1.70)     0.99 [  0.95]( 4.92)     0.97 [  3.15]( 1.60)     1.00 [ -0.00]( 3.67)
     16-groups     1.00 [ -0.00]( 1.10)     0.97 [  3.01]( 1.62)     0.96 [  3.77]( 1.42)     0.95 [  4.60]( 0.67)     0.96 [  4.44]( 1.10)
     
     
     ==================================================================
     Test          : tbench
     Units         : Normalized throughput
     Interpretation: Higher is better
     Statistic     : AMean
     ==================================================================
     Clients:    tip[pct imp](CV)       vanilla[pct imp](CV)     DELAYED[pct imp](CV)     DEFUALT[pct imp](CV)      BOTH[pct imp](CV)
         1     1.00 [  0.00]( 0.82)     1.04 [  4.33]( 1.84)     1.06 [  5.97]( 0.42)     1.06 [  6.12]( 1.02)     1.06 [  5.54]( 0.73)
         2     1.00 [  0.00]( 1.13)     1.06 [  5.52]( 1.04)     1.07 [  7.17]( 0.42)     1.07 [  6.81]( 0.30)     1.08 [  7.96]( 0.39)
         4     1.00 [  0.00]( 1.12)     1.05 [  5.41]( 0.53)     1.07 [  7.39]( 0.67)     1.06 [  6.45]( 0.91)     1.07 [  7.36]( 0.63)
         8     1.00 [  0.00]( 0.93)     1.06 [  5.72]( 0.47)     1.07 [  6.90]( 0.24)     1.07 [  7.09]( 1.45)     1.07 [  6.94]( 0.45)
        16     1.00 [  0.00]( 0.38)     1.07 [  6.99]( 0.50)     1.05 [  4.95]( 0.98)     1.05 [  5.39]( 0.71)     1.05 [  5.43]( 1.05)
        32     1.00 [  0.00]( 0.66)     1.05 [  4.68]( 1.79)     1.06 [  5.70]( 0.54)     1.07 [  6.93]( 2.39)     1.03 [  3.17]( 1.06)
        64     1.00 [  0.00]( 1.18)     1.06 [  5.53]( 0.37)     1.04 [  4.05]( 0.84)     1.07 [  7.35]( 1.57)     1.06 [  5.62]( 1.13)
       128     1.00 [  0.00]( 1.12)     1.06 [  5.52]( 0.13)     1.05 [  4.94]( 0.75)     1.08 [  7.56]( 0.81)     1.05 [  4.80]( 0.55)
       256     1.00 [  0.00]( 0.42)     0.99 [ -0.83]( 1.01)     0.99 [ -0.58]( 0.57)     1.00 [  0.06]( 0.68)     1.00 [  0.03]( 1.47)
       512     1.00 [  0.00]( 0.14)     1.01 [  1.06]( 0.13)     1.02 [  1.67]( 0.18)     1.03 [  2.62]( 0.28)     1.02 [  2.17]( 0.33)
      1024     1.00 [  0.00]( 0.26)     1.02 [  1.82]( 0.41)     1.02 [  2.48]( 0.27)     1.03 [  3.38]( 0.37)     1.01 [  1.39]( 0.03)
     
     
     ==================================================================
     Test          : stream-10
     Units         : Normalized Bandwidth, MB/s
     Interpretation: Higher is better
     Statistic     : HMean
     ==================================================================
     Test:       tip[pct imp](CV)       vanilla[pct imp](CV)     DELAYED[pct imp](CV)     DEFUALT[pct imp](CV)      BOTH[pct imp](CV)
      Copy     1.00 [  0.00]( 8.37)     0.97 [ -2.79]( 9.17)     0.99 [ -1.29]( 4.68)     1.01 [  1.25]( 4.86)     0.99 [ -0.66]( 9.29)
     Scale     1.00 [  0.00]( 2.85)     1.00 [  0.12]( 2.91)     0.99 [ -1.34]( 5.55)     1.00 [ -0.20]( 3.38)     0.98 [ -2.09]( 5.33)
       Add     1.00 [  0.00]( 3.39)     0.98 [ -2.36]( 4.85)     0.98 [ -2.32]( 5.23)     1.00 [  0.10]( 3.17)     0.98 [ -1.99]( 4.73)
     Triad     1.00 [  0.00]( 6.39)     1.01 [  1.45]( 8.42)     1.00 [ -0.38]( 8.28)     1.05 [  4.69]( 5.66)     1.06 [  6.02]( 4.53)
     
     
     ==================================================================
     Test          : stream-100
     Units         : Normalized Bandwidth, MB/s
     Interpretation: Higher is better
     Statistic     : HMean
     ==================================================================
     Test:       tip[pct imp](CV)       vanilla[pct imp](CV)     DELAYED[pct imp](CV)     DEFUALT[pct imp](CV)      BOTH[pct imp](CV)
      Copy     1.00 [  0.00]( 3.91)     0.98 [ -1.84]( 2.07)     0.98 [ -2.06]( 6.75)     1.01 [  1.31]( 2.86)     1.02 [  2.12]( 3.30)
     Scale     1.00 [  0.00]( 4.34)     0.96 [ -3.80]( 6.38)     0.97 [ -2.88]( 6.99)     0.97 [ -2.62]( 5.70)     1.00 [ -0.37]( 3.94)
       Add     1.00 [  0.00]( 4.14)     0.97 [ -3.04]( 6.31)     0.97 [ -3.14]( 6.91)     0.99 [ -0.79]( 4.24)     1.00 [ -0.35]( 4.06)
     Triad     1.00 [  0.00]( 1.00)     0.98 [ -2.36]( 2.60)     0.96 [ -3.80]( 6.15)     0.99 [ -0.61]( 1.33)     0.97 [ -3.05]( 5.48)
     
     
     ==================================================================
     Test          : netperf
     Units         : Normalized Througput
     Interpretation: Higher is better
     Statistic     : AMean
     ==================================================================
     Clients:         tip[pct imp](CV)       vanilla[pct imp](CV)     DELAYED[pct imp](CV)     DEFUALT[pct imp](CV)      BOTH[pct imp](CV)
      1-clients     1.00 [  0.00]( 0.41)     1.06 [  5.63]( 1.17)     1.06 [  6.03]( 0.53)     1.09 [  8.63]( 0.79)     1.06 [  6.36]( 0.09)
      2-clients     1.00 [  0.00]( 0.58)     1.06 [  6.25]( 0.85)     1.05 [  5.47]( 0.83)     1.08 [  8.24]( 1.29)     1.05 [  5.15]( 0.57)
      4-clients     1.00 [  0.00]( 0.35)     1.06 [  5.59]( 0.49)     1.05 [  5.06]( 0.65)     1.08 [  8.15]( 0.82)     1.05 [  5.46]( 0.62)
      8-clients     1.00 [  0.00]( 0.48)     1.06 [  5.76]( 0.81)     1.05 [  5.26]( 0.71)     1.08 [  8.19]( 0.60)     1.05 [  5.34]( 0.80)
     16-clients     1.00 [  0.00]( 0.66)     1.06 [  5.95]( 0.69)     1.06 [  5.52]( 0.78)     1.08 [  8.31]( 0.86)     1.06 [  5.76]( 0.48)
     32-clients     1.00 [  0.00]( 1.15)     1.06 [  5.84]( 1.34)     1.06 [  5.57]( 0.96)     1.08 [  8.30]( 0.90)     1.06 [  5.66]( 1.45)
     64-clients     1.00 [  0.00]( 1.38)     1.05 [  5.20]( 1.50)     1.05 [  4.67]( 1.39)     1.07 [  7.43]( 1.47)     1.05 [  5.18]( 1.48)
     128-clients    1.00 [  0.00]( 0.87)     1.04 [  4.39]( 1.03)     1.04 [  4.43]( 0.98)     1.06 [  5.98]( 1.01)     1.05 [  4.60]( 1.06)
     256-clients    1.00 [  0.00]( 5.36)     1.00 [  0.10]( 3.48)     1.00 [  0.09]( 4.22)     1.01 [  0.71]( 3.18)     1.01 [  1.25]( 3.69)
     512-clients    1.00 [  0.00](54.39)     0.98 [ -1.93](52.45)     1.00 [ -0.35](53.30)     1.02 [  1.75](54.93)     1.02 [  1.76](55.71)
     
     
     ==================================================================
     Test          : schbench
     Units         : Normalized 99th percentile latency in us
     Interpretation: Lower is better
     Statistic     : Median
     ==================================================================
     #workers: tip[pct imp](CV)       vanilla[pct imp](CV)     DELAYED[pct imp](CV)     DEFUALT[pct imp](CV)      BOTH[pct imp](CV)
       1     1.00 [ -0.00]( 8.54)     0.89 [ 10.87](35.39)     0.78 [ 21.74](34.41)     0.91 [  8.70](12.44)     0.72 [ 28.26](26.70)
       2     1.00 [ -0.00]( 1.15)     0.88 [ 12.00]( 4.55)     0.78 [ 22.00]( 6.61)     0.90 [ 10.00]( 5.75)     0.82 [ 18.00](17.98)
       4     1.00 [ -0.00](13.46)     0.96 [  4.17](10.60)     1.00 [ -0.00]( 8.54)     0.96 [  4.17]( 3.30)     0.98 [  2.08]( 8.19)
       8     1.00 [ -0.00]( 7.14)     0.84 [ 15.79]( 8.44)     0.98 [  1.75]( 3.67)     0.95 [  5.26]( 4.99)     0.91 [  8.77]( 2.92)
      16     1.00 [ -0.00]( 3.49)     1.08 [ -8.47]( 4.69)     1.07 [ -6.78]( 0.92)     1.07 [ -6.78]( 0.91)     1.07 [ -6.78]( 3.27)
      32     1.00 [ -0.00]( 1.06)     1.10 [ -9.57]( 2.91)     1.07 [ -7.45]( 2.97)     1.07 [ -7.45]( 4.23)     1.05 [ -5.32]( 7.80)
      64     1.00 [ -0.00]( 5.48)     1.25 [-25.00]( 5.36)     1.17 [-17.44]( 1.44)     1.23 [-23.26]( 2.79)     1.20 [-19.77]( 2.19)
     128     1.00 [ -0.00](10.45)     1.18 [-17.99](12.54)     1.16 [-16.36](21.21)     1.13 [-12.85](12.71)     1.09 [ -8.64]( 3.05)
     256     1.00 [ -0.00](31.14)     1.28 [-27.79](17.66)     0.84 [ 16.21](32.14)     1.19 [-19.21]( 1.68)     1.07 [ -6.86]( 7.48)
     512     1.00 [ -0.00]( 1.52)     1.01 [ -0.51]( 2.78)     0.97 [  3.03]( 2.91)     0.98 [  1.77]( 1.07)     1.01 [ -0.51]( 1.01)
     
     
     ==================================================================
     Test          : new-schbench-requests-per-second
     Units         : Normalized Requests per second
     Interpretation: Higher is better
     Statistic     : Median
     ==================================================================
     #workers: tip[pct imp](CV)       vanilla[pct imp](CV)     DELAYED[pct imp](CV)     DEFUALT[pct imp](CV)      BOTH[pct imp](CV)
       1     1.00 [  0.00]( 1.07)     1.00 [  0.29]( 0.00)     1.00 [  0.29]( 0.15)     0.99 [ -0.59]( 0.46)     1.00 [  0.29]( 0.30)
       2     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)
       4     1.00 [  0.00]( 0.00)     1.00 [ -0.29]( 0.15)     1.00 [  0.00]( 0.00)     1.00 [ -0.29]( 0.15)     1.00 [  0.00]( 0.00)
       8     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.15)     1.00 [  0.29]( 0.00)     1.00 [  0.00]( 0.40)     1.00 [  0.29]( 0.15)
      16     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.00)
      32     1.00 [  0.00]( 3.41)     0.99 [ -0.95]( 2.06)     0.98 [ -2.23]( 3.41)     0.98 [ -2.23]( 3.31)     1.03 [  2.54]( 0.32)
      64     1.00 [  0.00]( 1.05)     0.92 [ -7.58]( 9.01)     0.86 [-13.92](11.30)     1.00 [  0.00]( 4.74)     1.00 [ -0.38]( 9.98)
     128     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.38]( 0.00)
     256     1.00 [  0.00]( 0.72)     1.00 [ -0.31]( 0.42)     1.01 [  1.23]( 1.33)     1.01 [  0.61]( 0.83)     1.01 [  0.92]( 1.36)
     512     1.00 [  0.00]( 0.57)     1.00 [  0.00]( 0.45)     0.99 [ -0.72]( 1.18)     1.00 [  0.48]( 0.33)     1.01 [  1.44]( 0.49)
     
     
     ==================================================================
     Test          : new-schbench-wakeup-latency
     Units         : Normalized 99th percentile latency in us
     Interpretation: Lower is better
     Statistic     : Median
     ==================================================================
     #workers: tip[pct imp](CV)       vanilla[pct imp](CV)     DELAYED[pct imp](CV)     DEFUALT[pct imp](CV)      BOTH[pct imp](CV)
       1     1.00 [ -0.00]( 9.11)     0.75 [ 25.00](11.08)     0.69 [ 31.25]( 8.13)     0.75 [ 25.00](11.08)     0.62 [ 37.50]( 8.94)
       2     1.00 [ -0.00]( 0.00)     1.00 [ -0.00]( 3.78)     0.86 [ 14.29]( 7.45)     0.93 [  7.14]( 3.87)     0.79 [ 21.43]( 4.84)
       4     1.00 [ -0.00]( 3.78)     0.93 [  7.14]( 3.87)     0.79 [ 21.43]( 4.56)     0.93 [  7.14]( 0.00)     0.79 [ 21.43]( 8.85)
       8     1.00 [ -0.00]( 0.00)     1.08 [ -8.33](12.91)     0.92 [  8.33]( 0.00)     0.83 [ 16.67](18.23)     1.08 [ -8.33](12.91)
      16     1.00 [ -0.00]( 7.56)     0.92 [  7.69](11.71)     0.85 [ 15.38](12.06)     1.08 [ -7.69](11.92)     0.85 [ 15.38](12.91)
      32     1.00 [ -0.00](15.11)     1.07 [ -6.67]( 3.30)     1.00 [ -0.00](19.06)     1.00 [ -0.00](15.11)     0.80 [ 20.00]( 4.43)
      64     1.00 [ -0.00]( 9.63)     1.00 [ -0.00]( 8.15)     1.00 [ -0.00]( 5.34)     1.05 [ -5.00]( 7.75)     0.90 [ 10.00]( 9.94)
     128     1.00 [ -0.00]( 4.86)     0.89 [ 11.06]( 7.83)     0.91 [  8.54]( 7.87)     0.88 [ 12.06]( 8.73)     0.86 [ 14.07]( 5.01)
     256     1.00 [ -0.00]( 2.34)     1.00 [  0.20]( 0.10)     1.04 [ -4.50]( 4.59)     1.03 [ -2.90]( 1.95)     1.04 [ -3.70]( 4.13)
     512     1.00 [ -0.00]( 0.40)     1.00 [  0.38]( 0.20)     1.00 [  0.38]( 0.20)     0.99 [  0.77]( 0.20)     1.00 [ -0.00]( 0.40)
     
     
     ==================================================================
     Test          : new-schbench-request-latency
     Units         : Normalized 99th percentile latency in us
     Interpretation: Lower is better
     Statistic     : Median
     ==================================================================
     #workers: tip[pct imp](CV)       vanilla[pct imp](CV)     DELAYED[pct imp](CV)     DEFUALT[pct imp](CV)      BOTH[pct imp](CV)
       1     1.00 [ -0.00]( 2.73)     0.98 [  2.08]( 1.04)     0.99 [  1.30]( 1.07)     1.02 [ -1.82]( 0.00)     1.01 [ -1.30]( 3.10)
       2     1.00 [ -0.00]( 0.87)     1.05 [ -5.40]( 3.10)     1.02 [ -1.89]( 1.58)     1.01 [ -1.08]( 2.76)     1.02 [ -1.62]( 1.45)
       4     1.00 [ -0.00]( 1.21)     0.99 [  0.54]( 1.27)     0.99 [  1.08]( 1.67)     1.01 [ -1.21]( 1.21)     1.01 [ -1.35]( 1.91)
       8     1.00 [ -0.00]( 0.27)     0.99 [  0.79]( 2.14)     0.98 [  2.37]( 0.72)     0.99 [  1.05]( 2.53)     0.99 [  0.79]( 1.12)
      16     1.00 [ -0.00]( 4.04)     1.01 [ -0.53]( 0.55)     1.01 [ -0.80]( 1.08)     1.00 [ -0.27]( 0.36)     0.99 [  0.53]( 0.50)
      32     1.00 [ -0.00]( 7.35)     1.10 [ -9.97](21.10)     1.01 [ -0.66](10.27)     1.25 [-25.36](21.41)     0.90 [  9.52]( 2.08)
      64     1.00 [ -0.00]( 3.54)     1.03 [ -2.89]( 1.55)     1.02 [ -2.00]( 0.98)     1.01 [ -0.67]( 3.62)     1.01 [ -0.89]( 4.98)
     128     1.00 [ -0.00]( 0.37)     0.99 [  0.62]( 0.00)     0.99 [  0.72]( 0.11)     0.99 [  0.62]( 0.11)     0.99 [  0.83]( 0.11)
     256     1.00 [ -0.00]( 9.57)     0.92 [  8.36]( 2.22)     1.03 [ -3.11](12.58)     1.05 [ -5.02]( 8.36)     1.00 [ -0.00](11.71)
     512     1.00 [ -0.00]( 1.82)     1.01 [ -1.23]( 0.94)     1.02 [ -2.45]( 1.53)     1.00 [  0.35]( 0.83)     1.02 [ -1.93]( 1.40)
     
     ==================================================================
     Test          : Various longer running benchmarks
     Units         : %diff in throughput reported
     Interpretation: Higher is better
     Statistic     : Median
     ==================================================================
     Benchmarks:                 vanilla     DELAYED   DEFAULT    BOTH
     ycsb-cassandra              -0.05%       0.65%    -0.49%    -0.48%
     ycsb-mongodb                -0.80%      -0.85%    -1.00%    -0.98%
      
     deathstarbench-1x            2.44%       1.54%     1.65%     0.18%
     deathstarbench-2x            5.47%       4.88%     7.92%     6.75%
     deathstarbench-3x            0.36%       1.74%    -1.75%     0.31%
     deathstarbench-6x            1.14%       1.94%     2.24%     1.58%
     
     hammerdb+mysql 16VU          1.08%       5.21%     2.69%     3.80%
     hammerdb+mysql 64VU         -0.43%      -0.31%     2.12%    -0.25%


-- 
Thanks and Regards,
Prateek