linux-kernel - Re: [PATCH 1/6] cpuidle: teo: Increase util-threshold

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <569b7923-d7ba-49a9-a88d-906f44903d20@arm.com>
Date: Mon, 10 Jun 2024 10:11:01 +0100
From: Christian Loehle <christian.loehle@....com>
To: Qais Yousef <qyousef@...alina.io>
Cc: linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
 rafael@...nel.org, vincent.guittot@...aro.org, peterz@...radead.org,
 daniel.lezcano@...aro.org, anna-maria@...utronix.de,
 kajetan.puchalski@....com, lukasz.luba@....com, dietmar.eggemann@....com
Subject: Re: [PATCH 1/6] cpuidle: teo: Increase util-threshold

On 6/9/24 23:47, Qais Yousef wrote:
> On 06/06/24 10:00, Christian Loehle wrote:
>> Increase the util-threshold by a lot as it was low enough for some
>> minor load to always be active, especially on smaller CPUs.
>>
>> For small cap CPUs (Pixel6) the util threshold is as low as 1.
>> For CPUs of capacity <64 it is 0. So ensure it is at a minimum, too.
>>
>> Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness")
>> Reported-by: Qais Yousef <qyousef@...alina.io>
>> Reported-by: Vincent Guittot <vincent.guittot@...aro.org>
>> Suggested-by: Kajetan Puchalski <kajetan.puchalski@....com>
>> Signed-off-by: Christian Loehle <christian.loehle@....com>
>> ---
>>  drivers/cpuidle/governors/teo.c | 11 +++++------
>>  1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
>> index 7244f71c59c5..45f43e2ee02d 100644
>> --- a/drivers/cpuidle/governors/teo.c
>> +++ b/drivers/cpuidle/governors/teo.c
>> @@ -146,13 +146,11 @@
>>   * The number of bits to shift the CPU's capacity by in order to determine
>>   * the utilized threshold.
>>   *
>> - * 6 was chosen based on testing as the number that achieved the best balance
>> - * of power and performance on average.
>> - *
>>   * The resulting threshold is high enough to not be triggered by background
>> - * noise and low enough to react quickly when activity starts to ramp up.
>> + * noise.
>>   */
>> -#define UTIL_THRESHOLD_SHIFT 6
>> +#define UTIL_THRESHOLD_SHIFT 2
>> +#define UTIL_THRESHOLD_MIN 50
>>  
>>  /*
>>   * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
>> @@ -671,7 +669,8 @@ static int teo_enable_device(struct cpuidle_driver *drv,
>>  	int i;
>>  
>>  	memset(cpu_data, 0, sizeof(*cpu_data));
>> -	cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT;
>> +	cpu_data->util_threshold = max(UTIL_THRESHOLD_MIN,
>> +				max_capacity >> UTIL_THRESHOLD_SHIFT);
> 
> Thanks for trying to fix this. But I am afraid this is not a solution. There's
> no magic number that can truly work here - we tried. As I tried to explain
> before, a higher util value doesn't mean long idle time is unlikely. And
> blocked load can cause problems where a decay can take too long.
> 
> We are following up with the suggestions I have thrown back then and we'll
> share results if anything actually works.

Namely watching increase / decrease of utilization?
I think you would have to watch at least a couple of values before entering such
a logic and at that point the intercepts logic will handle it anyway.
Furthermore IMO we should be wary about introducing any state in teo that persists
across calls if not absolutely necessary (like intercept-detection) as it really
makes teo much less predictable.

> 
> For now, I think a revert is more appropriate. There was some perf benefit, but
> the power regressions were bad and there's no threshold value that actually
> works. The thresholding concept itself is incorrect and flawed - it seemed the
> correct thing back then, yes. But in a hindsight now it doesn't work.

Unfortunate :/
OK. I'll do some more testing with that, too. From what I can see a revert wouldn't
have terrible fallout with the series altogether, so I might just change this for
v2 and drop 2/6.

Kind Regards,
Christian