linux-kernel - Re: scheduler performance regression since v6.11

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2084b7d9-bb4f-4a5e-aaec-98e07b3edc2e@arm.com>
Date: Tue, 20 May 2025 16:38:09 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Peter Zijlstra <peterz@...radead.org>, Chris Mason <clm@...a.com>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
 vschneid@...hat.com, Juri Lelli <juri.lelli@...il.com>,
 Thomas Gleixner <tglx@...utronix.de>
Subject: Re: scheduler performance regression since v6.11

On 16/05/2025 12:18, Peter Zijlstra wrote:
> On Mon, May 12, 2025 at 06:35:24PM -0400, Chris Mason wrote:
> 
> Right, so I can reproduce on Thomas' SKL and maybe see some of it on my
> SPR.
> 
> I've managed to discover a whole bunch of ways that ttwu() can explode
> again :-) But as you surmised, your workload *LOVES* TTWU_QUEUE, and
> DELAYED_DEQUEUE takes some of that away, because those delayed things
> remain on-rq and ttwu() can't deal with that other than by doing the
> wakeup in-line and that's exactly the thing this workload hates most.
> 
> (I'll keep poking at ttwu() to see if I can get a combination of
> TTWU_QUEUE and DELAYED_DEQUEUE that does not explode in 'fun' ways)
> 
> However, I've found that flipping the default in ttwu_queue_cond() seems
> to make up for quite a bit -- for your workload.
> 
> (basically, all the work we can get away from those pinned message CPUs
> is a win)
> 
> Also, meanwhile you discovered that the other part of your performance
> woes were due to dl_server, specifically, disabling that gave you back a
> healthy chunk of your performance.
> 
> The problem is indeed that we toggle the dl_server on every nr_running
> from 0 and to 0 transition, and your workload has a shit-ton of those,
> so every time we get the overhead of starting and stopping this thing.
> 
> In hindsight, that's a fairly stupid setup, and the below patch changes
> this to keep the dl_server around until it's not seen fair activity for
> a whole period. This appears to fully recover this dip.
> 
> Trouble seems to be that dl_server_update() always gets tickled by
> random garbage, so in the end the dl_server never stops... oh well.
> 
> Juri, could you have a look at this, perhaps I messed up something
> trivial -- its been like that this week :/

On the same VM I use as a SUT for the 'hammerdb-mysqld' tests:

https://lkml.kernel.org/r/d6692902-837a-4f30-913b-763f01a5a7ea@arm.com

I can't spot any v6.11 related changes (dl_server or TTWU_QUEUE) but a
PSI related one for v6.12 results in a ~8% schbench regression.

VM (m7gd.16xlarge, 16 logical CPUs) on Graviton3:

schbench -L -m 4 -M auto -t 128 -n 0 -r 60

3840cbe24cf0 - sched: psi: fix bogus pressure spikes from aggregation race

With CONFIG_PSI enabled we call cpu_clock(cpu) now multiple times (up to
4 times per task switch in my setup) in:

__schedule() -> psi_sched_switch() -> psi_task_switch() ->
psi_group_change().

There seems to be another/other v6.12 related patch(es) later which
cause(s) another 4% regression I yet have to discover.

[...]