linux-kernel - Re: stable: commit "cpuidle: menu: Avoid discarding useful information" causes regressions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0hGu-JdwR57cwKfB+a98Pv7e3y36X6xCo=PyGdD2hwkhQ@mail.gmail.com>
Date: Tue, 14 Oct 2025 17:54:18 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Christian Loehle <christian.loehle@....com>, Sergey Senozhatsky <senozhatsky@...omium.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>, 
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, 
	Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>, Sasha Levin <sashal@...nel.org>, 
	Daniel Lezcano <daniel.lezcano@...aro.org>, linux-pm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Tomasz Figa <tfiga@...omium.org>, stable@...r.kernel.org
Subject: Re: stable: commit "cpuidle: menu: Avoid discarding useful
 information" causes regressions

On Tue, Oct 14, 2025 at 5:11 PM Christian Loehle
<christian.loehle@....com> wrote:
>
> On 10/14/25 12:55, Sergey Senozhatsky wrote:
> > On (25/10/14 11:25), Christian Loehle wrote:
> >> On 10/14/25 11:23, Sergey Senozhatsky wrote:
> >>> On (25/10/14 10:50), Christian Loehle wrote:
> >>>>> Upstream fixup fa3fa55de0d ("cpuidle: governors: menu: Avoid using
> >>>>> invalid recent intervals data") doesn't address the problems we are
> >>>>> observing.  Revert seems to be bringing performance metrics back to
> >>>>> pre-regression levels.
> >>>>
> >>>> Any details would be much appreciated.
> >>>> How do the idle state usages differ with and without
> >>>> "cpuidle: menu: Avoid discarding useful information"?
> >>>> What do the idle states look like in your platform?
> >>>
> >>> Sure, I can run tests.  How do I get the numbers/stats
> >>> that you are asking for?
> >>
> >> Ideally just dump
> >> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> >> before and after the test.
> >
> > OK, got some data for you.  The terminology being used here is as follows:
> >
> > - 6.1-base
> >   is 6.1 stable with a9edb700846 "cpuidle: menu: Avoid discarding useful information"
> >
> > - 6.1-base-fixup
> >   is 6.1 stable with a9edb700846 and fa3fa55de0d6 "cpuidle: governors:
> >   menu: Avoid using invalid recent intervals data" cherry-pick
> >
> > - 6.1-revert
> >   is 6.1 stable with a9edb700846 reverted (and no fixup commit, obviously)
> >
> > Just to show the scale of regression, results of some of the benchmarks:
> >
> >   6.1-base:           84.5
> >   6.1-base-fixup:     76.5
> >   6.1-revert:         59.5
> >
> >   (lower is better, 6.1-revert has the same results as previous stable
> >   kernels).
> This immediately threw me off.
> The fixup was written for a specific system which had completely broken
> cpuidle. It shouldn't affect any sane system significantly.
> I double checked the numbers and your system looks fine, in fact none of
> the tests had any rejected cpuidle occurrences. So functionally base and
> base-fixup are identical for you. The cpuidle numbers are also reasonably
> 'in the noise', so just for the future some stats would be helpful on those
> scores.
>
> I can see a huge difference between base and revert in terms of cpuidle,
> so that's enough for me to take a look, I'll do that now.
> (6.1-revert has more C3_ACPI in favor of C1_ACPI.)
>
> (Also I can't send this email without at least recommending teo instead of menu
> for your platform / use-cases, if you deemed it unfit I'd love to know what
> didn't work for you!)

Well, yeah.

So I've already done some analysis.

There are 4 C-states, POLL, C1, C6 and C10 (at least that's what the
MWAIT hints tell me).

This is how many times each of them was requested during the workload
run on base 6.1.y:

POLL: 21445
C1: 2993722
C6: 767029
C10: 736854

and in percentage of the total idle state requests:

POLL: 0,47%
C1: 66,25%
C6: 16,97%
C10: 16,31%

With the problematic commit reverted, this became

POLL: 16092
C1: 2452591
C6: 750933
C10: 1150259

and (again) in percentage of the total:

POLL: 0,37%
C1: 56,12%
C6: 17,18%
C10: 26,32%

Overall, POLL is negligible and the revet had no effect on the number
of times C6 was requested.  The difference is for C1 and C10 and it's
10% in both cases, but going in opposite directions so to speak: C1
was requested 10% less and C10 was requested 10% more after the
revert.

Let's see how this corresponds to the residency numbers.

For base 6.1.y there was

POLL: 599883
C1: 732303748
C6: 576785253
C10: 2020491489

and in percentage of the total

POLL: 0,02%
C1: 21,99%
C6: 17,32%
C10: 60,67%

After the revert it became

POLL: 469451
C1: 517623465
C6: 508945687
C10: 2567701673

and in percentage of the total

POLL: 0,01%
C1: 14,40%
C6: 14,16%
C10: 71,43%

so with the revert the CPUs spend around 7% more time in deep idle
states (C6 and C10 combined).

I have to say that this is consistent with the intent of the
problematic commit, which is to reduce the number of times the deepest
idle state is requested although it is likely to be too deep.

However, on the system in question this somehow causes performance to
drop significantly (even though shallow idle states are used more
often which should result in lower average idle state exit latency and
better performance).

One possible explanation is that this somehow affects turbo
frequencies.  That is, requesting shallower idle states on idle CPUs
prevents the other CPUs from getting sufficiently high turbo.

Sergey, can you please run the workload under turbostat on the base
6.1.y and on 6.1.y with the problematic commit reverted and send the
turbostat output from both runs (note: turbostat needs to be run as
root)?