lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e9eb077b-3253-49be-b997-a07dcde86cdc@arm.com>
Date: Tue, 14 Oct 2025 18:19:51 +0100
From: Christian Loehle <christian.loehle@....com>
To: "Rafael J. Wysocki" <rafael@...nel.org>,
 Sergey Senozhatsky <senozhatsky@...omium.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>,
 Sasha Levin <sashal@...nel.org>, Daniel Lezcano <daniel.lezcano@...aro.org>,
 linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
 Tomasz Figa <tfiga@...omium.org>, stable@...r.kernel.org
Subject: Re: stable: commit "cpuidle: menu: Avoid discarding useful
 information" causes regressions

On 10/14/25 16:54, Rafael J. Wysocki wrote:
> On Tue, Oct 14, 2025 at 5:11 PM Christian Loehle
> <christian.loehle@....com> wrote:
>>
>> On 10/14/25 12:55, Sergey Senozhatsky wrote:
>>> On (25/10/14 11:25), Christian Loehle wrote:
>>>> On 10/14/25 11:23, Sergey Senozhatsky wrote:
>>>>> On (25/10/14 10:50), Christian Loehle wrote:
>>>>>>> Upstream fixup fa3fa55de0d ("cpuidle: governors: menu: Avoid using
>>>>>>> invalid recent intervals data") doesn't address the problems we are
>>>>>>> observing.  Revert seems to be bringing performance metrics back to
>>>>>>> pre-regression levels.
>>>>>>
>>>>>> Any details would be much appreciated.
>>>>>> How do the idle state usages differ with and without
>>>>>> "cpuidle: menu: Avoid discarding useful information"?
>>>>>> What do the idle states look like in your platform?
>>>>>
>>>>> Sure, I can run tests.  How do I get the numbers/stats
>>>>> that you are asking for?
>>>>
>>>> Ideally just dump
>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>> before and after the test.
>>>
>>> OK, got some data for you.  The terminology being used here is as follows:
>>>
>>> - 6.1-base
>>>   is 6.1 stable with a9edb700846 "cpuidle: menu: Avoid discarding useful information"
>>>
>>> - 6.1-base-fixup
>>>   is 6.1 stable with a9edb700846 and fa3fa55de0d6 "cpuidle: governors:
>>>   menu: Avoid using invalid recent intervals data" cherry-pick
>>>
>>> - 6.1-revert
>>>   is 6.1 stable with a9edb700846 reverted (and no fixup commit, obviously)
>>>
>>> Just to show the scale of regression, results of some of the benchmarks:
>>>
>>>   6.1-base:           84.5
>>>   6.1-base-fixup:     76.5
>>>   6.1-revert:         59.5
>>>
>>>   (lower is better, 6.1-revert has the same results as previous stable
>>>   kernels).
>> This immediately threw me off.
>> The fixup was written for a specific system which had completely broken
>> cpuidle. It shouldn't affect any sane system significantly.
>> I double checked the numbers and your system looks fine, in fact none of
>> the tests had any rejected cpuidle occurrences. So functionally base and
>> base-fixup are identical for you. The cpuidle numbers are also reasonably
>> 'in the noise', so just for the future some stats would be helpful on those
>> scores.
>>
>> I can see a huge difference between base and revert in terms of cpuidle,
>> so that's enough for me to take a look, I'll do that now.
>> (6.1-revert has more C3_ACPI in favor of C1_ACPI.)
>>
>> (Also I can't send this email without at least recommending teo instead of menu
>> for your platform / use-cases, if you deemed it unfit I'd love to know what
>> didn't work for you!)
> 
> Well, yeah.
> 
> So I've already done some analysis.
> 
> There are 4 C-states, POLL, C1, C6 and C10 (at least that's what the
> MWAIT hints tell me).
> 
> This is how many times each of them was requested during the workload
> run on base 6.1.y:
> 
> POLL: 21445
> C1: 2993722
> C6: 767029
> C10: 736854
> 
> and in percentage of the total idle state requests:
> 
> POLL: 0,47%
> C1: 66,25%
> C6: 16,97%
> C10: 16,31%
> 
> With the problematic commit reverted, this became
> 
> POLL: 16092
> C1: 2452591
> C6: 750933
> C10: 1150259
> 
> and (again) in percentage of the total:
> 
> POLL: 0,37%
> C1: 56,12%
> C6: 17,18%
> C10: 26,32%
> 
> Overall, POLL is negligible and the revet had no effect on the number
> of times C6 was requested.  The difference is for C1 and C10 and it's
> 10% in both cases, but going in opposite directions so to speak: C1
> was requested 10% less and C10 was requested 10% more after the
> revert.
> 
> Let's see how this corresponds to the residency numbers.
> 
> For base 6.1.y there was
> 
> POLL: 599883
> C1: 732303748
> C6: 576785253
> C10: 2020491489
> 
> and in percentage of the total
> 
> POLL: 0,02%
> C1: 21,99%
> C6: 17,32%
> C10: 60,67%
> 
> After the revert it became
> 
> POLL: 469451
> C1: 517623465
> C6: 508945687
> C10: 2567701673
> 
> and in percentage of the total
> 
> POLL: 0,01%
> C1: 14,40%
> C6: 14,16%
> C10: 71,43%
> 
> so with the revert the CPUs spend around 7% more time in deep idle
> states (C6 and C10 combined).
> 
> I have to say that this is consistent with the intent of the
> problematic commit, which is to reduce the number of times the deepest
> idle state is requested although it is likely to be too deep.
> 
> However, on the system in question this somehow causes performance to
> drop significantly (even though shallow idle states are used more
> often which should result in lower average idle state exit latency and
> better performance).
> 
> One possible explanation is that this somehow affects turbo
> frequencies.  That is, requesting shallower idle states on idle CPUs
> prevents the other CPUs from getting sufficiently high turbo.
> 
> Sergey, can you please run the workload under turbostat on the base
> 6.1.y and on 6.1.y with the problematic commit reverted and send the
> turbostat output from both runs (note: turbostat needs to be run as
> root)?

That's the most plausible explanation and would also be my guess.
FWIW most of the C3_ACPI (== C10) with revert are objectively wrong
with 78% idle misses (they were already pretty high with base around 72.5%).

I'll leave this here for easier following:

===== 6.1-base: after minus before deltas (aggregated across CPUs) =====
+---------+-------------+------------+--------------+---------------+------------+------------+---------+
|   state | time_diff_s | usage_diff | avg_resid_us | rejected_diff | above_diff | below_diff | share_% |
+---------+-------------+------------+--------------+---------------+------------+------------+---------+
|    POLL |       0.600 |     21,445 |         28.0 |             0 |          0 |     19,846 |    0.02 |
| C1_ACPI |     732.304 |  2,993,722 |        244.6 |             0 |      3,816 |    280,613 |   21.99 |
| C2_ACPI |     576.785 |    767,029 |        752.0 |             0 |    272,105 |        453 |   17.32 |
| C3_ACPI |   2,020.491 |    736,854 |      2,742.1 |             0 |    534,424 |          0 |   60.67 |
|   TOTAL |   3,330.180 |  4,519,050 |              |             0 |    810,345 |    300,912 |  100.00 |
+---------+-------------+------------+--------------+---------------+------------+------------+---------+

===== 6.1-revert: after minus before deltas (aggregated across CPUs) =====
+---------+-------------+------------+--------------+---------------+------------+------------+---------+
|   state | time_diff_s | usage_diff | avg_resid_us | rejected_diff | above_diff | below_diff | share_% |
+---------+-------------+------------+--------------+---------------+------------+------------+---------+
|    POLL |       0.469 |     16,092 |         29.2 |             0 |          0 |     14,855 |    0.01 |
| C1_ACPI |     517.623 |  2,452,591 |        211.1 |             0 |      4,109 |    150,500 |   14.40 |
| C2_ACPI |     508.946 |    750,933 |        677.8 |             0 |    327,457 |        427 |   14.16 |
| C3_ACPI |   2,567.702 |  1,150,259 |      2,232.3 |             0 |    895,311 |          0 |   71.43 |
|   TOTAL |   3,594.740 |  4,369,875 |              |             0 |  1,226,877 |    165,782 |  100.00 |
+---------+-------------+------------+--------------+---------------+------------+------------+---------+

===== 6.1-revert minus 6.1-base (state-by-state deltas of the deltas) =====
+---------+-----------+----------+----------+---------------+----------+----------+
|   state | Δshare_pp |   Δusage |  Δtime_s | Δavg_resid_us |   Δabove |   Δbelow |
+---------+-----------+----------+----------+---------------+----------+----------+
|    POLL |     -0.00 |   -5,353 |   -0.130 |           1.2 |       +0 |   -4,991 |
| C1_ACPI |     -7.59 | -541,131 | -214.680 |         -33.6 |     +293 | -130,113 |
| C2_ACPI |     -3.16 |  -16,096 |  -67.840 |         -74.2 |  +55,352 |      -26 |
| C3_ACPI |    +10.76 | +413,405 |  547.210 |        -509.8 | +360,887 |       +0 |
|   TOTAL |     +0.00 | -149,175 |  264.560 |               | +416,532 | -135,130 |
+---------+-----------+----------+----------+---------------+----------+----------+

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ