linux-kernel - Re: scheduler performance regression since v6.11

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55913bff-d7ad-47a3-bbd7-2bf8bb63ac59@meta.com>
Date: Wed, 21 May 2025 10:32:45 -0400
From: Chris Mason <clm@...a.com>
To: Dietmar Eggemann <dietmar.eggemann@....com>,
        Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
        vschneid@...hat.com, Juri Lelli <juri.lelli@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: scheduler performance regression since v6.11

On 5/21/25 9:59 AM, Dietmar Eggemann wrote:
> On 20/05/2025 16:53, Chris Mason wrote:
>> On 5/20/25 10:38 AM, Dietmar Eggemann wrote:
>>> On 16/05/2025 12:18, Peter Zijlstra wrote:
>>>> On Mon, May 12, 2025 at 06:35:24PM -0400, Chris Mason wrote:
> 
> [...]
> 
>>> I can't spot any v6.11 related changes (dl_server or TTWU_QUEUE) but a
>>> PSI related one for v6.12 results in a ~8% schbench regression.
>>>
>>> VM (m7gd.16xlarge, 16 logical CPUs) on Graviton3:
>>>
>>> schbench -L -m 4 -M auto -t 128 -n 0 -r 60
>>>
>>> 3840cbe24cf0 - sched: psi: fix bogus pressure spikes from aggregation race
>>
>> I also saw a regression on this one, but it wasn't stable enough for me
>> to be sure.  I'll retest, but I'm guessing this is made worse by the VM
>> / graviton setup?
> 
> For me the 8% regression here is pretty stable. I have to add that I ran
> schbench in:
> 
>   /sys/fs/cgroup/user.slice/user-1000.slice/session-33.scope
> 
> So that explains IMHO the 4 calls to psi_group_change() from
> psi_task_switch() now doing all their own 'now = cpu_clock(cpu)' call.

Makes sense.  If you pull the latest schbench, you can add -s 0 to the
command line.   That removes the usleep done by the workers, which
focuses things even more on the CPU selection when message threads wake
up the workers.

On turin, I'm seeing ~35% lower RPS with later kernels than with 6.9
when I add -s 0.  I'm also seeing 35% higher wakeup latencies, so I'll
spend some time today measuring placement decisions between the two.

> 
> schbench-6509    [004] d....   689.050466: psi_task_switch: CPU4 [schbench 6509] -> [schbench 6514] ->
> schbench-6509    [004] d....   689.050466: psi_group_change: CPU4 now=689050466118
> schbench-6509    [004] d....   689.050467: psi_group_change: CPU4 now=689050466537
> schbench-6509    [004] d....   689.050467: psi_group_change: CPU4 now=689050466950
> schbench-6509    [004] d....   689.050468: psi_group_change: CPU4 now=689050467838
> schbench-6509    [004] d....   689.050468: psi_task_switch: CPU4 [schbench 6509] -> [schbench 6514] <-
>  
>> I've been testing Peter's changes, and they do help on my skylake box
>> but not as much on the big turin machines.  I'm trying to sort that out,
> 
> Turin vs. SKL,SPR ?

This started with a bad networking benchmark on turin, where later
kernels have regressed since 6.9.  I made some changes to schbench to
try and model that regression, but until I can claw back the performance
on turin, I won't really be sure schbench isn't just exposing other
unrelated problems.

I also pulled in SKL and copperlake because they are easiest for me to
test on, but the turin machines have a much bigger hit.  It's a single
socket machine, but the high thread count seems to be making this set of
regressions much worse.

-chris