[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e9eb077b-3253-49be-b997-a07dcde86cdc@arm.com>
Date: Tue, 14 Oct 2025 18:19:51 +0100
From: Christian Loehle <christian.loehle@....com>
To: "Rafael J. Wysocki" <rafael@...nel.org>,
Sergey Senozhatsky <senozhatsky@...omium.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Artem Bityutskiy <artem.bityutskiy@...ux.intel.com>,
Sasha Levin <sashal@...nel.org>, Daniel Lezcano <daniel.lezcano@...aro.org>,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
Tomasz Figa <tfiga@...omium.org>, stable@...r.kernel.org
Subject: Re: stable: commit "cpuidle: menu: Avoid discarding useful
information" causes regressions
On 10/14/25 16:54, Rafael J. Wysocki wrote:
> On Tue, Oct 14, 2025 at 5:11 PM Christian Loehle
> <christian.loehle@....com> wrote:
>>
>> On 10/14/25 12:55, Sergey Senozhatsky wrote:
>>> On (25/10/14 11:25), Christian Loehle wrote:
>>>> On 10/14/25 11:23, Sergey Senozhatsky wrote:
>>>>> On (25/10/14 10:50), Christian Loehle wrote:
>>>>>>> Upstream fixup fa3fa55de0d ("cpuidle: governors: menu: Avoid using
>>>>>>> invalid recent intervals data") doesn't address the problems we are
>>>>>>> observing. Revert seems to be bringing performance metrics back to
>>>>>>> pre-regression levels.
>>>>>>
>>>>>> Any details would be much appreciated.
>>>>>> How do the idle state usages differ with and without
>>>>>> "cpuidle: menu: Avoid discarding useful information"?
>>>>>> What do the idle states look like in your platform?
>>>>>
>>>>> Sure, I can run tests. How do I get the numbers/stats
>>>>> that you are asking for?
>>>>
>>>> Ideally just dump
>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>> before and after the test.
>>>
>>> OK, got some data for you. The terminology being used here is as follows:
>>>
>>> - 6.1-base
>>> is 6.1 stable with a9edb700846 "cpuidle: menu: Avoid discarding useful information"
>>>
>>> - 6.1-base-fixup
>>> is 6.1 stable with a9edb700846 and fa3fa55de0d6 "cpuidle: governors:
>>> menu: Avoid using invalid recent intervals data" cherry-pick
>>>
>>> - 6.1-revert
>>> is 6.1 stable with a9edb700846 reverted (and no fixup commit, obviously)
>>>
>>> Just to show the scale of regression, results of some of the benchmarks:
>>>
>>> 6.1-base: 84.5
>>> 6.1-base-fixup: 76.5
>>> 6.1-revert: 59.5
>>>
>>> (lower is better, 6.1-revert has the same results as previous stable
>>> kernels).
>> This immediately threw me off.
>> The fixup was written for a specific system which had completely broken
>> cpuidle. It shouldn't affect any sane system significantly.
>> I double checked the numbers and your system looks fine, in fact none of
>> the tests had any rejected cpuidle occurrences. So functionally base and
>> base-fixup are identical for you. The cpuidle numbers are also reasonably
>> 'in the noise', so just for the future some stats would be helpful on those
>> scores.
>>
>> I can see a huge difference between base and revert in terms of cpuidle,
>> so that's enough for me to take a look, I'll do that now.
>> (6.1-revert has more C3_ACPI in favor of C1_ACPI.)
>>
>> (Also I can't send this email without at least recommending teo instead of menu
>> for your platform / use-cases, if you deemed it unfit I'd love to know what
>> didn't work for you!)
>
> Well, yeah.
>
> So I've already done some analysis.
>
> There are 4 C-states, POLL, C1, C6 and C10 (at least that's what the
> MWAIT hints tell me).
>
> This is how many times each of them was requested during the workload
> run on base 6.1.y:
>
> POLL: 21445
> C1: 2993722
> C6: 767029
> C10: 736854
>
> and in percentage of the total idle state requests:
>
> POLL: 0,47%
> C1: 66,25%
> C6: 16,97%
> C10: 16,31%
>
> With the problematic commit reverted, this became
>
> POLL: 16092
> C1: 2452591
> C6: 750933
> C10: 1150259
>
> and (again) in percentage of the total:
>
> POLL: 0,37%
> C1: 56,12%
> C6: 17,18%
> C10: 26,32%
>
> Overall, POLL is negligible and the revet had no effect on the number
> of times C6 was requested. The difference is for C1 and C10 and it's
> 10% in both cases, but going in opposite directions so to speak: C1
> was requested 10% less and C10 was requested 10% more after the
> revert.
>
> Let's see how this corresponds to the residency numbers.
>
> For base 6.1.y there was
>
> POLL: 599883
> C1: 732303748
> C6: 576785253
> C10: 2020491489
>
> and in percentage of the total
>
> POLL: 0,02%
> C1: 21,99%
> C6: 17,32%
> C10: 60,67%
>
> After the revert it became
>
> POLL: 469451
> C1: 517623465
> C6: 508945687
> C10: 2567701673
>
> and in percentage of the total
>
> POLL: 0,01%
> C1: 14,40%
> C6: 14,16%
> C10: 71,43%
>
> so with the revert the CPUs spend around 7% more time in deep idle
> states (C6 and C10 combined).
>
> I have to say that this is consistent with the intent of the
> problematic commit, which is to reduce the number of times the deepest
> idle state is requested although it is likely to be too deep.
>
> However, on the system in question this somehow causes performance to
> drop significantly (even though shallow idle states are used more
> often which should result in lower average idle state exit latency and
> better performance).
>
> One possible explanation is that this somehow affects turbo
> frequencies. That is, requesting shallower idle states on idle CPUs
> prevents the other CPUs from getting sufficiently high turbo.
>
> Sergey, can you please run the workload under turbostat on the base
> 6.1.y and on 6.1.y with the problematic commit reverted and send the
> turbostat output from both runs (note: turbostat needs to be run as
> root)?
That's the most plausible explanation and would also be my guess.
FWIW most of the C3_ACPI (== C10) with revert are objectively wrong
with 78% idle misses (they were already pretty high with base around 72.5%).
I'll leave this here for easier following:
===== 6.1-base: after minus before deltas (aggregated across CPUs) =====
+---------+-------------+------------+--------------+---------------+------------+------------+---------+
| state | time_diff_s | usage_diff | avg_resid_us | rejected_diff | above_diff | below_diff | share_% |
+---------+-------------+------------+--------------+---------------+------------+------------+---------+
| POLL | 0.600 | 21,445 | 28.0 | 0 | 0 | 19,846 | 0.02 |
| C1_ACPI | 732.304 | 2,993,722 | 244.6 | 0 | 3,816 | 280,613 | 21.99 |
| C2_ACPI | 576.785 | 767,029 | 752.0 | 0 | 272,105 | 453 | 17.32 |
| C3_ACPI | 2,020.491 | 736,854 | 2,742.1 | 0 | 534,424 | 0 | 60.67 |
| TOTAL | 3,330.180 | 4,519,050 | | 0 | 810,345 | 300,912 | 100.00 |
+---------+-------------+------------+--------------+---------------+------------+------------+---------+
===== 6.1-revert: after minus before deltas (aggregated across CPUs) =====
+---------+-------------+------------+--------------+---------------+------------+------------+---------+
| state | time_diff_s | usage_diff | avg_resid_us | rejected_diff | above_diff | below_diff | share_% |
+---------+-------------+------------+--------------+---------------+------------+------------+---------+
| POLL | 0.469 | 16,092 | 29.2 | 0 | 0 | 14,855 | 0.01 |
| C1_ACPI | 517.623 | 2,452,591 | 211.1 | 0 | 4,109 | 150,500 | 14.40 |
| C2_ACPI | 508.946 | 750,933 | 677.8 | 0 | 327,457 | 427 | 14.16 |
| C3_ACPI | 2,567.702 | 1,150,259 | 2,232.3 | 0 | 895,311 | 0 | 71.43 |
| TOTAL | 3,594.740 | 4,369,875 | | 0 | 1,226,877 | 165,782 | 100.00 |
+---------+-------------+------------+--------------+---------------+------------+------------+---------+
===== 6.1-revert minus 6.1-base (state-by-state deltas of the deltas) =====
+---------+-----------+----------+----------+---------------+----------+----------+
| state | Δshare_pp | Δusage | Δtime_s | Δavg_resid_us | Δabove | Δbelow |
+---------+-----------+----------+----------+---------------+----------+----------+
| POLL | -0.00 | -5,353 | -0.130 | 1.2 | +0 | -4,991 |
| C1_ACPI | -7.59 | -541,131 | -214.680 | -33.6 | +293 | -130,113 |
| C2_ACPI | -3.16 | -16,096 | -67.840 | -74.2 | +55,352 | -26 |
| C3_ACPI | +10.76 | +413,405 | 547.210 | -509.8 | +360,887 | +0 |
| TOTAL | +0.00 | -149,175 | 264.560 | | +416,532 | -135,130 |
+---------+-----------+----------+----------+---------------+----------+----------+
Powered by blists - more mailing lists