linux-kernel - RE: [PATCH v1] cpuidle: governors: menu: Predict longer idle time when in doubt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <001801dc4041$607c19f0$21744dd0$@telus.net>
Date: Sat, 18 Oct 2025 08:10:44 -0700
From: "Doug Smythies" <dsmythies@...us.net>
To: "'Rafael J. Wysocki'" <rafael@...nel.org>,
	"'Christian Loehle'" <christian.loehle@....com>,
	"'Sergey Senozhatsky'" <senozhatsky@...omium.org>
Cc: "'Linux PM'" <linux-pm@...r.kernel.org>,
	"'LKML'" <linux-kernel@...r.kernel.org>,
	"'Artem Bityutskiy'" <artem.bityutskiy@...ux.intel.com>,
	"'Tomasz Figa'" <tfiga@...omium.org>,
	"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [PATCH v1] cpuidle: governors: menu: Predict longer idle time when in doubt

Hi all,

I have been following and testing these menu.c changes over the last months,
but never reported back on this email list because:
1.) I never found anything significant to report.
2.) I always seemed to be a week or more behind the conversations.

On 2025.10.18 04:47 Rafael wrote:
> On Fri, Oct 17, 2025 at 8:37 PM Christian Loehle wrote:
>> On 10/17/25 10:39, Rafael J. Wysocki wrote:
>>> On Fri, Oct 17, 2025 at 10:22 AM Christian Loehle wrote:
>>>> On 10/16/25 17:25, Rafael J. Wysocki wrote:
>>>>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>>>>
>>>>> It is reported that commit 85975daeaa4d ("cpuidle: menu: Avoid discarding
>>>>> useful information") led to a performance regression on Intel Jasper Lake
>>>>> systems because it reduced the time spent by CPUs in idle state C7 which
>>>>> is correlated to the maximum frequency the CPUs can get to because of an
>>>>> average running power limit [1].

I would like to understand Sergey's benchmark test better, and even try
to repeat the results on my test system. I would also like to try to 
separate the variables in an attempt to isolate potential contributors.

To eliminate the PL1 effect, limit the CPU frequency to 2300 MHz and repeat
the test. To eliminate potential CPU frequency scaling contributions, use the
performance CPU frequency scaling governor. Both changes at once would
be an acceptable first step.

Sergey: Would you be willing to do that test?
Sergey: Could you provide more details about your test?

>From the turbostat data of the other day, it seems that the system was
only power throttled for about 25 seconds for each test. What we don't know
is how long the test took overall or the magnitude of any contributions from
the power limit throttling.

Extracted PL1 area from the turbostat data:

cpu0: PKG Limit #1: ENabled (6.000000 Watts, 28.000000 sec, clamp ENabled)

	MHz		Power
Sample	revert	base	revert	base
17	1150	3039	2.15	9.49	<<< base over power limit
18	2898	3086	6.26	8.63	<<< revert over power limit
19	2968	3017	8.67	9.07
20	3054	2642	8.22	7.47
21	2910	2510	9	5.57	<<< base throttled
22	2950	2438	8.62	5.74
23	2300	2571	5.61	5.67	<<< revert throttled
24	2423	2667	5.81	6.01
25	2560	1827	5.65	2.81	<<< base not throttled
26	2478	829	5.5	1.84 
27	1552	992	2.36	1.86	<<< revert not throttled

>From my testing of kernels 6.17-rc1, rc2,rc3 in August and September:

779b1a1cb13a cpuidle: governors: menu: Avoid selecting states with too much latency - v6.17-rc3
fa3fa55de0d6 cpuidle: governors: menu: Avoid using invalid recent intervals data - v6.17-rc2
baseline reference: v6.17-rc1
d4a7882f93bf cpuidle: menu: Optimize bucket assignment when next_timer_ns equals KTIME_MAX - v6.16-rc1

there was an area of about 11% regression in the 2 core ping-pong sweep test.
After modifying the PL1 limits on my test system so that it would engage for the entire test,
I re-ran the test with kernel 6.18-rc1 (ref) and also with this patch (rjw).
The results were identical for each test.
Some supporting graphs are attached.

>>>>> [snip]
>>>> [snip]
>>>> Anyway, the patch makes sense, let me run some tests and get back.
>>>
>>> Thanks!
>>
>> Unfortunately this patch regresses my tests about as much as a revert of
>> 85975daeaa4d would.
>>(menu-1 is $SUBJECT, menu-m current mainline, menu-r mainline with
>> 85975daeaa4d reverted):

So I could better understand the magnitudes of things,
Christain's test results averaged and restated:

Averages:					% regression
mmcblk1	menu-1		1502.3		36.3%
		menu-m	2356.7	
		menu-r		1483.3		37.1%

mmcblk2	menu-1		3389.0		41.1%
		menu-m	5754.7	
		menu-r		3438.3		40.3%

nvme0n1	menu-1		5812.0		47.4%
		menu-m	11059.0	
		menu-r		5386.7		51.3%

sda		menu-1		934.7		42.6%
		menu-m	1629.3	
		menu-r		907.3		44.3%

nullb0		menu-1		101466.0	0.1%
		menu-m	101559.7	
		menu-r		101708.0	-0.1%

mtdblock3	menu-1		158.7		29.2%
		menu-m	224.0	
		menu-r		142.7		36.3%

So, except for nulb0 pretty significant.

Whereas Sergey's results are the other way around by similar magnitudes.

6.1-base:	84.5	+42.0%
6.1-base-fixup:	76.5	+28.5%
6.1-revert:	59.5	
backport:	78.5	+31.9%

> Well, this means that in the majority of cases the maximum sample idle
> time is so large that UINT_MAX may as well be returned instead.
>
> The possible correlation between idle power and the max OPP a CPU can
> get to has not been taken into account in cpuidle directly so far, but
> it clearly isn't true that using shallow idle states more often always
> improves performance.  It may hurt performance too.
>
> Actually, this possible correlation appears to have a broader impact,
> as it may affect CAS and EAS at least in principle, so it may be
> advisable to allocate some time for discussing it during upcoming
> conferences.

My suggestion is to better understand Sergey's benchmark test
for potential inputs to those discussions.

> At this point I'm inclined to revert commit 85975daeaa4d because
> anything else would be clearly artificial and likely ineffective at
> least in some cases.
>
> The systems that enjoyed better performance after that commit can
> switch over to teo and continue to enjoy it and everybody else still
> using menu should be able to continue to do so.

... Doug


Download attachment "loop-times.png" of type "image/png" (41753 bytes)

Download attachment "loop-times-relative.png" of type "image/png" (52369 bytes)

Download attachment "loop-times-pl.png" of type "image/png" (37632 bytes)

Download attachment "loop-times-pl-relative.png" of type "image/png" (49173 bytes)