[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y4iuFVby+prcBSVw@e126311.manchester.arm.com>
Date: Thu, 1 Dec 2022 13:37:25 +0000
From: Kajetan Puchalski <kajetan.puchalski@....com>
To: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Jian-Min Liu <jian-min.liu@...iatek.com>,
Ingo Molnar <mingo@...nel.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Morten Rasmussen <morten.rasmussen@....com>,
Vincent Donnefort <vdonnefort@...gle.com>,
Quentin Perret <qperret@...gle.com>,
Patrick Bellasi <patrick.bellasi@...bug.net>,
Abhijeet Dharmapurikar <adharmap@...cinc.com>,
Qais Yousef <qais.yousef@....com>,
linux-kernel@...r.kernel.org,
Jonathan JMChen <jonathan.jmchen@...iatek.com>
Subject: Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime
On Wed, Nov 30, 2022 at 07:14:51PM +0100, Dietmar Eggemann wrote:
> By `runtime of the activation` you refer to `curr->sum_exec_runtime -
> time(a)` ? And the latter we don't have?
>
> And `runtime = curr->se.sum_exec_runtime - curr->se.prev_sum_exec_run`
> is only covering the time since we got onto the cpu, right?
>
> With a missing `runtime >>= 10` (from __update_load_sum()) and using
> `runtime = curr->se.sum_exec_runtime - curr->se.prev_sum_exec_runtime`
> for a 1 task-workload (so no preemption) with factor 2 or 4 I get at
> least close to the original rq->cfs.avg.util_avg and util_est.enqueued
> signals (cells (5)-(8) in the notebook below).
> https://nbviewer.org/github/deggeman/lisa/blob/ipynbs/ipynb/scratchpad/UTIL_EST_FASTER.ipynb?flush_cache=true
>
With those two changes as described above the comparative results are as
follows:
Max frame durations (worst case scenario)
+--------------------------------+-----------+------------+
| kernel | iteration | value |
+--------------------------------+-----------+------------+
| baseline_60hz | 10 | 149.935514 |
| pelt_rampup_runtime_shift_60hz | 10 | 108.126862 |
+--------------------------------+-----------+------------+
Power usage [mW]
+--------------+--------------------------------+-------+-----------+
| chan_name | kernel | value | perc_diff |
+--------------+--------------------------------+-------+-----------+
| total_power | baseline_60hz | 141.6 | 0.0% |
| total_power | pelt_rampup_runtime_shift_60hz | 168.0 | 18.61% |
+--------------+--------------------------------+-------+-----------+
Mean frame duration (average case)
+---------------+--------------------------------+-------+-----------+
| variable | kernel | value | perc_diff |
+---------------+--------------------------------+-------+-----------+
| mean_duration | baseline_60hz | 16.7 | 0.0% |
| mean_duration | pelt_rampup_runtime_shift_60hz | 13.6 | -18.9% |
+---------------+--------------------------------+-------+-----------+
Jank percentage
+-----------+--------------------------------+-------+-----------+
| variable | kernel | value | perc_diff |
+-----------+--------------------------------+-------+-----------+
| jank_perc | baseline_60hz | 4.0 | 0.0% |
| jank_perc | pelt_rampup_runtime_shift_60hz | 1.5 | -64.04% |
+-----------+--------------------------------+-------+-----------+
Meaning it's a middle ground of sorts - instead of a 90% increase in
power usage it's 'just' 19%. At the same time though the fastest PELT
multiplier (pelt_4) was getting better max frame durations (85ms vs
108ms) for about half the power increase (9.6% vs 18.6%).
Powered by blists - more mailing lists