linux-kernel - RE: Linux 6.10-rc2 - massive performance regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <930d66db93814520be94f02ee62a19e6@AcuMS.aculab.com>
Date: Sun, 9 Jun 2024 09:35:36 +0000
From: David Laight <David.Laight@...LAB.COM>
To: Linux kernel regressions list <regressions@...ts.linux.dev>
CC: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Linus Torvalds
	<torvalds@...ux-foundation.org>
Subject: RE: Linux 6.10-rc2 - massive performance regression

From: Linux regression tracking (Thorsten Leemhuis)
> Sent: 09 June 2024 09:11
> 
> On 09.06.24 00:00, Linus Torvalds wrote:
> > On Sat, 8 Jun 2024 at 14:36, David Laight <David.Laight@...lab.com> wrote:
> > [...]
> >> I've done some tests.
> >> I'm seeing a three-fold slow down on:
> >> $ i=0; while [ $i -lt 1000000 ]; do i=$((i + 1)); done
> >> which goes from 1 second to 3.
> >>
> >> I can run that with ftrace monitoring scheduler events (and a few
> >> other things) and can't spot anywhere the process isn't running
> >> for a significant time.
> >
> > Sounds like cpu frequency. Almost certainly hw-specific. I went
> > through that on my Threadripper in the 6.9 timeframe, but I'm not
> > seeing any issues in this current release.
> 
> David, what kind of hardware do you use?

This is on an 17-7700 (4 cores + hyperthreading enabled = 8 cpu).

> Johan Hovold as
> man-in-the-middle just reported "CPU frequency of the big cores on the
> Lenovo ThinkPad X13s sometimes appears to get stuck at a low frequency
> with 6.10-rc2" and confirmed "that once the cores are fully throttled
> (using the stepwise thermal governor) due to the skin temperature
> reaching the
> first trip point, scaling_max_freq gets stuck at the next OPP".

That's not what I'm seeing.
I can turn the speed up and down by stopping/starting a daemon we use
for processing audio.
(I can give anyone a copy; it is freely downloadable from the company
web site - if you know exactly where to look!)
Basically that ends up running a bit of code on every cpu every 10ms.

There is a big difference in the number of sched_migrate_task traces
between 6.9 and 6.10 (15 v 83).

I suspect that the underlying problem is that the cpu governor
doesn't allow for a 'busy' process being moved to an idle cpu?
So if you bounce a process about it always runs an 800MHz.

My dmesg (6.9 and 6.10) has:
cpuidle: using governor idle
cpuidle: using governor ladder

But I had a feeling that some 'hardware magic' changes the cpu
speed on these systems?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)