linux-kernel - Re: [PATCH v2 00/12] sched: Address schbench regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHj01PaJ_rcsduMn@arm.com>
Date: Thu, 17 Jul 2025 15:04:55 +0200
From: Beata Michalska <beata.michalska@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, vschneid@...hat.com, clm@...a.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 00/12] sched: Address schbench regression

Hi Peter,

Below are the results of running the schbench on Altra
(as a reminder 2-core MC, 2 Numa Nodes, 160 cores)

`Legend:
- 'Flags=none' means neither TTWU_QUEUE_DEFAULT nor
  TTWU_QUEUE_DELAYED is set (or available).
- '*…*' marks Top-3 Min & Max, Bottom-3 Std dev, and
  Top-3 90th-percentile values.

Base 6.16-rc5
  Flags=none
  Min=681870.77 | Max=913649.50 | Std=53802.90       | 90th=890201.05

sched/fair: bump sd->max_newidle_lb_cost when newidle balance fails
  Flags=none
  Min=770952.12 | Max=888047.45 | Std=34430.24       | 90th=877347.24

sched/psi: Optimize psi_group_change() cpu_clock() usage
  Flags=none
  Min=748137.65 | Max=936312.33 | Std=56818.23       | 90th=*921497.27*

sched/deadline: Less agressive dl_server handling
  Flags=none
  Min=783621.95 | Max=*944604.67* | Std=43538.64     | 90th=*909961.16*

sched: Optimize ttwu() / select_task_rq()
  Flags=none
  Min=*826038.87* | Max=*1003496.73* | Std=49875.43  | 90th=*971944.88*

sched: Use lock guard in ttwu_runnable()
  Flags=none
  Min=780172.75 | Max=914170.20 | Std=35998.33       | 90th=866095.80

sched: Add ttwu_queue controls
  Flags=TTWU_QUEUE_DEFAULT
  Min=*792430.45* | Max=903422.78 | Std=33582.71     | 90th=887256.68

  Flags=none
  Min=*803532.80* | Max=894772.48 | Std=29359.35     | 90th=877920.34

sched: Introduce ttwu_do_migrate()
  Flags=TTWU_QUEUE_DEFAULT
  Min=749824.30 | Max=*965139.77* | Std=57022.47     | 90th=903659.07
 
  Flags=none
  Min=787464.65 | Max=885349.20 | Std=27030.82       | 90th=875750.44

psi: Split psi_ttwu_dequeue()
  Flags=TTWU_QUEUE_DEFAULT
  Min=762960.98 | Max=916538.12 | Std=42002.19       | 90th=876425.84
 
  Flags=none
  Min=773608.48 | Max=920812.87 | Std=42189.17       | 90th=871760.47

sched: Re-arrange __ttwu_queue_wakelist()
  Flags=TTWU_QUEUE_DEFAULT
  Min=702870.58 | Max=835243.42 | Std=44224.02       | 90th=825311.12

  Flags=none
  Min=712499.38 | Max=838492.03 | Std=38351.20       | 90th=817135.94

sched: Use lock guard in sched_ttwu_pending()
  Flags=TTWU_QUEUE_DEFAULT
  Min=729080.55 | Max=853609.62 | Std=43440.63       | 90th=838684.48

  Flags=none
  Min=708123.47 | Max=850804.48 | Std=40642.28       | 90th=830295.08

sched: Change ttwu_runnable() vs sched_delayed
  Flags=TTWU_QUEUE_DEFAULT
  Min=580218.87 | Max=838684.07 | Std=57078.24       | 90th=792973.33

  Flags=none
  Min=721274.90 | Max=784897.92 | Std=*19017.78*     | 90th=774792.30

sched: Add ttwu_queue support for delayed tasks
  Flags=none
  Min=712979.48 | Max=830192.10 | Std=33173.90       | 90th=798599.66

  Flags=TTWU_QUEUE_DEFAULT
  Min=698094.12 | Max=857627.93 | Std=38294.94       | 90th=789981.59
 
  Flags=TTWU_QUEUE_DEFAULT/TTWU_QUEUE_DELAYED
  Min=683348.77 | Max=782179.15 | Std=25086.71       | 90th=750947.00

  Flags=TTWU_QUEUE_DELAYED
  Min=669822.23 | Max=807768.85 | Std=38766.41       | 90th=794052.05

sched: fix ttwu_delayed
  Flags=none
  Min=671844.35 | Max=798737.67 | Std=33438.64       | 90th=788584.62

  Flags=TTWU_QUEUE_DEFAULT
  Min=688607.40 | Max=828679.53 | Std=33184.78       | 90th=782490.23

  Flags=TTWU_QUEUE_DEFAULT/TTWU_QUEUE_DELAYED
  Min=579171.13 | Max=643929.18 | Std=*14644.92*     | 90th=639764.16

  Flags=TTWU_QUEUE_DELAYED
  Min=614265.22 | Max=675172.05 | Std=*13309.92*     | 90th=647181.10


Best overall performer:
sched: Optimize ttwu() / select_task_rq()
  Flags=none
  Min=*826038.87* | Max=*1003496.73* | Std=49875.43 | 90th=*971944.88*

Hope this will he somehwat helpful.

---
BR
Beata

On Wed, Jul 02, 2025 at 01:49:24PM +0200, Peter Zijlstra wrote:
> Hi!
> 
> Previous version:
> 
>   https://lkml.kernel.org/r/20250520094538.086709102@infradead.org
> 
> 
> Changes:
>  - keep dl_server_stop(), just remove the 'normal' usage of it (juril)
>  - have the sched_delayed wake list IPIs do select_task_rq() (vingu)
>  - fixed lockdep splat (dietmar)
>  - added a few preperatory patches
> 
> 
> Patches apply on top of tip/master (which includes the disabling of private futex)
> and clm's newidle balance patch (which I'm awaiting vingu's ack on).
> 
> Performance is similar to the last version; as tested on my SPR on v6.15 base:
> 
> v6.15:
> schbench-6.15.0-1.txt:average rps: 2891403.72
> schbench-6.15.0-2.txt:average rps: 2889997.02
> schbench-6.15.0-3.txt:average rps: 2894745.17
> 
> v6.15 + patches 1-10:
> schbench-6.15.0-dirty-4.txt:average rps: 3038265.95
> schbench-6.15.0-dirty-5.txt:average rps: 3037327.50
> schbench-6.15.0-dirty-6.txt:average rps: 3038160.15
> 
> v6.15 + all patches:
> schbench-6.15.0-dirty-deferred-1.txt:average rps: 3043404.30
> schbench-6.15.0-dirty-deferred-2.txt:average rps: 3046124.17
> schbench-6.15.0-dirty-deferred-3.txt:average rps: 3043627.10
> 
> 
> Patches can also be had here:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core
> 
> 
> I'm hoping we can get this merged for next cycle so we can all move on from this.
> 
>