[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <001801dc4041$607c19f0$21744dd0$@telus.net>
Date: Sat, 18 Oct 2025 08:10:44 -0700
From: "Doug Smythies" <dsmythies@...us.net>
To: "'Rafael J. Wysocki'" <rafael@...nel.org>,
"'Christian Loehle'" <christian.loehle@....com>,
"'Sergey Senozhatsky'" <senozhatsky@...omium.org>
Cc: "'Linux PM'" <linux-pm@...r.kernel.org>,
"'LKML'" <linux-kernel@...r.kernel.org>,
"'Artem Bityutskiy'" <artem.bityutskiy@...ux.intel.com>,
"'Tomasz Figa'" <tfiga@...omium.org>,
"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [PATCH v1] cpuidle: governors: menu: Predict longer idle time when in doubt
Hi all,
I have been following and testing these menu.c changes over the last months,
but never reported back on this email list because:
1.) I never found anything significant to report.
2.) I always seemed to be a week or more behind the conversations.
On 2025.10.18 04:47 Rafael wrote:
> On Fri, Oct 17, 2025 at 8:37 PM Christian Loehle wrote:
>> On 10/17/25 10:39, Rafael J. Wysocki wrote:
>>> On Fri, Oct 17, 2025 at 10:22 AM Christian Loehle wrote:
>>>> On 10/16/25 17:25, Rafael J. Wysocki wrote:
>>>>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>>>>
>>>>> It is reported that commit 85975daeaa4d ("cpuidle: menu: Avoid discarding
>>>>> useful information") led to a performance regression on Intel Jasper Lake
>>>>> systems because it reduced the time spent by CPUs in idle state C7 which
>>>>> is correlated to the maximum frequency the CPUs can get to because of an
>>>>> average running power limit [1].
I would like to understand Sergey's benchmark test better, and even try
to repeat the results on my test system. I would also like to try to
separate the variables in an attempt to isolate potential contributors.
To eliminate the PL1 effect, limit the CPU frequency to 2300 MHz and repeat
the test. To eliminate potential CPU frequency scaling contributions, use the
performance CPU frequency scaling governor. Both changes at once would
be an acceptable first step.
Sergey: Would you be willing to do that test?
Sergey: Could you provide more details about your test?
>From the turbostat data of the other day, it seems that the system was
only power throttled for about 25 seconds for each test. What we don't know
is how long the test took overall or the magnitude of any contributions from
the power limit throttling.
Extracted PL1 area from the turbostat data:
cpu0: PKG Limit #1: ENabled (6.000000 Watts, 28.000000 sec, clamp ENabled)
MHz Power
Sample revert base revert base
17 1150 3039 2.15 9.49 <<< base over power limit
18 2898 3086 6.26 8.63 <<< revert over power limit
19 2968 3017 8.67 9.07
20 3054 2642 8.22 7.47
21 2910 2510 9 5.57 <<< base throttled
22 2950 2438 8.62 5.74
23 2300 2571 5.61 5.67 <<< revert throttled
24 2423 2667 5.81 6.01
25 2560 1827 5.65 2.81 <<< base not throttled
26 2478 829 5.5 1.84
27 1552 992 2.36 1.86 <<< revert not throttled
>From my testing of kernels 6.17-rc1, rc2,rc3 in August and September:
779b1a1cb13a cpuidle: governors: menu: Avoid selecting states with too much latency - v6.17-rc3
fa3fa55de0d6 cpuidle: governors: menu: Avoid using invalid recent intervals data - v6.17-rc2
baseline reference: v6.17-rc1
d4a7882f93bf cpuidle: menu: Optimize bucket assignment when next_timer_ns equals KTIME_MAX - v6.16-rc1
there was an area of about 11% regression in the 2 core ping-pong sweep test.
After modifying the PL1 limits on my test system so that it would engage for the entire test,
I re-ran the test with kernel 6.18-rc1 (ref) and also with this patch (rjw).
The results were identical for each test.
Some supporting graphs are attached.
>>>>> [snip]
>>>> [snip]
>>>> Anyway, the patch makes sense, let me run some tests and get back.
>>>
>>> Thanks!
>>
>> Unfortunately this patch regresses my tests about as much as a revert of
>> 85975daeaa4d would.
>>(menu-1 is $SUBJECT, menu-m current mainline, menu-r mainline with
>> 85975daeaa4d reverted):
So I could better understand the magnitudes of things,
Christain's test results averaged and restated:
Averages: % regression
mmcblk1 menu-1 1502.3 36.3%
menu-m 2356.7
menu-r 1483.3 37.1%
mmcblk2 menu-1 3389.0 41.1%
menu-m 5754.7
menu-r 3438.3 40.3%
nvme0n1 menu-1 5812.0 47.4%
menu-m 11059.0
menu-r 5386.7 51.3%
sda menu-1 934.7 42.6%
menu-m 1629.3
menu-r 907.3 44.3%
nullb0 menu-1 101466.0 0.1%
menu-m 101559.7
menu-r 101708.0 -0.1%
mtdblock3 menu-1 158.7 29.2%
menu-m 224.0
menu-r 142.7 36.3%
So, except for nulb0 pretty significant.
Whereas Sergey's results are the other way around by similar magnitudes.
6.1-base: 84.5 +42.0%
6.1-base-fixup: 76.5 +28.5%
6.1-revert: 59.5
backport: 78.5 +31.9%
> Well, this means that in the majority of cases the maximum sample idle
> time is so large that UINT_MAX may as well be returned instead.
>
> The possible correlation between idle power and the max OPP a CPU can
> get to has not been taken into account in cpuidle directly so far, but
> it clearly isn't true that using shallow idle states more often always
> improves performance. It may hurt performance too.
>
> Actually, this possible correlation appears to have a broader impact,
> as it may affect CAS and EAS at least in principle, so it may be
> advisable to allocate some time for discussing it during upcoming
> conferences.
My suggestion is to better understand Sergey's benchmark test
for potential inputs to those discussions.
> At this point I'm inclined to revert commit 85975daeaa4d because
> anything else would be clearly artificial and likely ineffective at
> least in some cases.
>
> The systems that enjoyed better performance after that commit can
> switch over to teo and continue to enjoy it and everybody else still
> using menu should be able to continue to do so.
... Doug
Download attachment "loop-times.png" of type "image/png" (41753 bytes)
Download attachment "loop-times-relative.png" of type "image/png" (52369 bytes)
Download attachment "loop-times-pl.png" of type "image/png" (37632 bytes)
Download attachment "loop-times-pl-relative.png" of type "image/png" (49173 bytes)
Powered by blists - more mailing lists