linux-kernel - Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0f82011994be68502fd9833e499749866539c3df.camel@mediatek.com>
Date:   Tue, 20 Sep 2022 22:07:59 +0800
From:   Jian-Min Liu <jian-min.liu@...iatek.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Vincent Donnefort <vdonnefort@...gle.com>
CC:     Quentin Perret <qperret@...gle.com>,
        Patrick Bellasi <patrick.bellasi@...bug.net>,
        Abhijeet Dharmapurikar <adharmap@...cinc.com>,
        Qais Yousef <qais.yousef@....com>,
        <linux-kernel@...r.kernel.org>,
        Jonathan JMChen <jonathan.jmchen@...iatek.com>
Subject: Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime


Update some test data in android phone to support switching PELT HL 
is helpful functionality.

We switch runtime PELT HL during runtime by difference scenario e.g.
pelt8 in playing game, pelt32 in camera video. Support runntime
switching PELT HL is flexible for different workloads.

the below table show performance & power data points: 

---------------------------------------------------------------------
--|                      | PELT
halflife                                |
|                      |----------------------------------------------|
|                      |       32      |       16      |       8      |
|                      |----------------------------------------------|
|                      | avg  min  avg | avg  min  avg | avg  min  avg|
| Scenarios            | fps  fps  pwr | fps  fps  pwr | fps  fps  pwr|
|---------------------------------------------------------------------|
| HOK game 60fps       | 100  100  100 | 105 *134* 102 | 104 *152* 106|
| HOK game 90fps       | 100  100  100 | 101 *114* 101 | 103 *129* 105|
| HOK game 120fps      | 100  100  100 | 102 *124* 102 | 105 *134* 105|
| FHD video rec. 60fps | 100  100  100 | n/a  n/a  n/a | 100  100  103|
| Camera snapshot      | 100  100  100 | n/a  n/a  n/a | 100  100  102|
-----------------------------------------------------------------------

HOK ... Honour Of Kings, Video game
FHD ... Full High Definition
fps ... frame per second
pwr ... power consumption

table values are in %


On Mon, 2022-08-29 at 07:54 +0200, Dietmar Eggemann wrote:
> Many of the Android devices still prefer to run PELT with a shorter
> halflife than the hardcoded value of 32ms in mainline.
> 
> The Android folks claim better response time of display pipeline
> tasks
> (higher min and avg fps for 60, 90 or 120Hz refresh rate). Some of
> the
> benchmarks like PCmark web-browsing show higher scores when running
> with 16ms or 8ms PELT halflife. The gain in response time and
> performance is considered to outweigh the increase of energy
> consumption in these cases.
> 
> The original idea of introducing a PELT halflife compile time option
> for 32, 16, 8ms from Patrick Bellasi in 2018
> 
https://urldefense.com/v3/__https://lkml.kernel.org/r/20180409165134.707-1-patrick.bellasi@arm.com__;!!CTRNKA9wMg0ARbw!x-6IhaOmZWO5PJIWEfZLD-6grV2BwlOBpflNV57-oNZY8NfSocwlImAHM2TQFyo56_r-$
>  
> wasn't integrated into mainline mainly because of breaking the PELT
> stability requirement (see (1) below).
> 
> We have been experimenting with a new idea from Morten Rasmussen to
> instead introduce an additional clock between task and pelt clock.
> This
> way the effect of a shorter PELT halflife of 8ms or 16ms can be
> achieved by left-shifting the elapsed time. This is similar to the
> use
> of time shifting of the pelt clock to achieve scale invariance in
> PELT.
> The implementation is from Vincent Donnefort with some minor
> modifications to align with current tip sched/core.
> 
> ---
> 
> Known potential issues:
> 
> (1) PELT stability requirement
> 
> PELT halflife has to be larger than or equal to the scheduling
> period.
> 
> The sched_period (sysctl_sched_latency) of a typical mobile device
> with
> 8 CPUs with the default logarithmical tuning is 24ms so only the 32
> ms
> PELT halflife met this. Shorter halflife of 16ms or even 8ms would
> break
> this.
> 
> It looks like that this problem might not exist anymore because of
> the
> PELT rewrite in 2015, i.e. with commit 9d89c257dfb9
> ("sched/fair: Rewrite runnable load and utilization average
> tracking").
> Since then sched entities (task & task groups) and cfs_rq's are
> independently maintained rather than each entity update maintains the
> cfs_rq at the same time.
> 
> This seems to mitigate the issue that the cfs_rq signal is not
> correct
> when there are not all runnable entities able ot do a self update
> during
> a PELT halflife window.
> 
> That said, I'm not entirely sure whether the entity-cfs_rq
> synchronization is the only issue behind this PELT stability
> requirement.
> 
> 
> (2) PELT utilization versus util_est (estimated utilization)
> 
> The PELT signal of a periodic task oscillates with higher peak
> amplitude
> when using smaller halflife. For a typical periodic task of the
> display
> pipeline with a runtime/period of 8ms/16ms the peak amplitude is at
> ~40
> for 32ms, at ~80 for 16ms and at ~160 for 8ms. Util_est stores the
> util_avg peak as util_est.enqueued per task.
> 
> With an additional exponential weighted moving average (ewma) to
> smooth
> task utilization decreases, util_est values of the runnable tasks are
> aggregated on the root cfs_rq.
> CPU and task utilization for CPU frequency selection and task
> placement
> is the max value out of util_est and util_avg. 
> I.e. because of how util_est is implemented higher CPU Operating
> Performance Points and more capable CPUs are already chosen when
> using
> smaller PELT halflife.
> 
> 
> (3) Wrong PELT history when switching PELT multiplier
> 
> The PELT history becomes stale the moment the PELT multiplier is
> changed
> during runtime. So all decisions based on PELT are skewed for the
> time
> interval to produce LOAD_MAX_AVG (the sum of the infinite geometric
> series) which value is ~345ms for halflife=32ms (smaller for 8ms or
> 16ms).
> 
> Rate limiting the PELT multiplier change to this value is not solving
> the issue here. So the user would have to live with possible
> incorrect
> discussions during these PELT multiplier transition times.
> 
> ---
> 
> It looks like that individual task boosting e.g. via uclamp_min,
> possibly abstracted by middleware frameworks like Android Dynamic
> Performance Framework (ADPF) would be the way to go here but until
> this
> is fully available and adopted some Android folks will still prefer
> the
> overall system boosting they achieve by running with a shorter PELT
> halflife.
> 
> Vincent Donnefort (1):
>   sched/pelt: Introduce PELT multiplier
> 
>  kernel/sched/core.c  |  2 +-
>  kernel/sched/pelt.c  | 60
> ++++++++++++++++++++++++++++++++++++++++++++
>  kernel/sched/pelt.h  | 42 ++++++++++++++++++++++++++++---
>  kernel/sched/sched.h |  1 +
>  4 files changed, 100 insertions(+), 5 deletions(-)
>