lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Nov 2019 15:26:06 -0500
From:   Thara Gopinath <thara.gopinath@...aro.org>
To:     Ionela Voinescu <ionela.voinescu@....com>
Cc:     mingo@...hat.com, peterz@...radead.org, vincent.guittot@...aro.org,
        rui.zhang@...el.com, edubezval@...il.com, qperret@...gle.com,
        linux-kernel@...r.kernel.org, amit.kachhap@...il.com,
        javi.merino@...nel.org, daniel.lezcano@...aro.org
Subject: Re: [Patch v4 6/6] sched: thermal: Enable tuning of decay period

On 11/04/2019 11:12 AM, Ionela Voinescu wrote:
> Hi Thara,
> 
> On Tuesday 22 Oct 2019 at 16:34:25 (-0400), Thara Gopinath wrote:
>> Thermal pressure follows pelt signas which means the
>> decay period for thermal pressure is the default pelt
>> decay period. Depending on soc charecteristics and thermal
>> activity, it might be beneficial to decay thermal pressure
>> slower, but still in-tune with the pelt signals.
> 
> I wonder if it can be beneficial to decay thermal pressure faster as
> well.
> 
> This implementation makes 32 (LOAD_AVG_PERIOD) the minimum half-life
> of the thermal pressure samples. This results in more than 100ms for a
> sample to decay significantly and therefore let's say it can take more
> than 100ms for capacity to return to (close to) max when the CPU is no
> longer capped. This value seems high to me considering that a minimum
> value should result in close to 'instantaneous' behaviour, when there
> are thermal capping mechanisms that can react in ~20ms (hikey960 has a
> polling delay of 25ms, if I'm remembering correctly).
> 
> I agree 32ms seems like a good default but given that you've made this
> configurable as to give users options, I'm wondering if it would be
> better to cover a wider range.
> 
>> One way to achieve this is to provide a command line parameter
>> to set the decay coefficient to an integer between 0 and 10.
>>
>> Signed-off-by: Thara Gopinath <thara.gopinath@...aro.org>
>> ---
>> v3->v4:
>> 	- Removed the sysctl setting to tune decay period and instead
>> 	  introduced a command line parameter to control it. The rationale
>> 	  here being changing decay period of a PELT signal runtime can
>> 	  result in a skewed average value for atleast some cycles.
>>
>>  Documentation/admin-guide/kernel-parameters.txt |  5 +++++
>>  kernel/sched/thermal.c                          | 25 ++++++++++++++++++++++++-
>>  2 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index a84a83f..61d7baa 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -4273,6 +4273,11 @@
>>  			incurs a small amount of overhead in the scheduler
>>  			but is useful for debugging and performance tuning.
>>  
>> +	sched_thermal_decay_coeff=
>> +			[KNL, SMP] Set decay coefficient for thermal pressure signal.
>> +			Format: integer betweer 0 and 10
>> +			Default is 0.
>> +
>>  	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate
>>  			xtime_lock contention on larger systems, and/or RCU lock
>>  			contention on all systems with CONFIG_MAXSMP set.
>> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
>> index 0c84960..0da31e1 100644
>> --- a/kernel/sched/thermal.c
>> +++ b/kernel/sched/thermal.c
>> @@ -10,6 +10,28 @@
>>  #include "pelt.h"
>>  #include "thermal.h"
>>  
>> +/**
>> + * By default the decay is the default pelt decay period.
>> + * The decay coefficient can change is decay period in
>> + * multiples of 32.
> 
> This description has to be corrected as well, as per Peter's comment.
> 
> Also, it might be good not to use the value 32 directly but to mention
> that the decay period is a shift of LOAD_AVG_PERIOD. If that changes,
> the translation from decay shift to decay period below will change as
> well.

Hi Ionela,

I sent out the v5 without fixing this. Even if there are no other
comments on v5 I will send out a v6 fixing this.

Regarding a slower decay, we need a strong case for it.



-- 
Warm Regards
Thara

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ