linux-kernel - Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y4iuFVby+prcBSVw@e126311.manchester.arm.com>
Date:   Thu, 1 Dec 2022 13:37:25 +0000
From:   Kajetan Puchalski <kajetan.puchalski@....com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Jian-Min Liu <jian-min.liu@...iatek.com>,
        Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Vincent Donnefort <vdonnefort@...gle.com>,
        Quentin Perret <qperret@...gle.com>,
        Patrick Bellasi <patrick.bellasi@...bug.net>,
        Abhijeet Dharmapurikar <adharmap@...cinc.com>,
        Qais Yousef <qais.yousef@....com>,
        linux-kernel@...r.kernel.org,
        Jonathan JMChen <jonathan.jmchen@...iatek.com>
Subject: Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

On Wed, Nov 30, 2022 at 07:14:51PM +0100, Dietmar Eggemann wrote:

> By `runtime of the activation` you refer to `curr->sum_exec_runtime -
> time(a)` ? And the latter we don't have?
> 
> And `runtime = curr->se.sum_exec_runtime - curr->se.prev_sum_exec_run`
> is only covering the time since we got onto the cpu, right?
> 
> With a missing `runtime >>= 10` (from __update_load_sum()) and using
> `runtime = curr->se.sum_exec_runtime - curr->se.prev_sum_exec_runtime`
> for a 1 task-workload (so no preemption) with factor 2 or 4 I get at
> least close to the original rq->cfs.avg.util_avg and util_est.enqueued
> signals (cells (5)-(8) in the notebook below).

> https://nbviewer.org/github/deggeman/lisa/blob/ipynbs/ipynb/scratchpad/UTIL_EST_FASTER.ipynb?flush_cache=true
> 

With those two changes as described above the comparative results are as
follows:

Max frame durations (worst case scenario)

+--------------------------------+-----------+------------+
|            kernel              | iteration |   value    |
+--------------------------------+-----------+------------+
|         baseline_60hz          |    10     | 149.935514 |
| pelt_rampup_runtime_shift_60hz |    10     | 108.126862 |
+--------------------------------+-----------+------------+

Power usage [mW]

+--------------+--------------------------------+-------+-----------+
|  chan_name   |             kernel             | value | perc_diff |
+--------------+--------------------------------+-------+-----------+
| total_power  |         baseline_60hz          | 141.6 |   0.0%    |
| total_power  | pelt_rampup_runtime_shift_60hz | 168.0 |  18.61%   |
+--------------+--------------------------------+-------+-----------+

Mean frame duration (average case)

+---------------+--------------------------------+-------+-----------+
|   variable    |             kernel             | value | perc_diff |
+---------------+--------------------------------+-------+-----------+
| mean_duration |         baseline_60hz          | 16.7  |   0.0%    |
| mean_duration | pelt_rampup_runtime_shift_60hz | 13.6  |  -18.9%   |
+---------------+--------------------------------+-------+-----------+

Jank percentage

+-----------+--------------------------------+-------+-----------+
| variable  |             kernel             | value | perc_diff |
+-----------+--------------------------------+-------+-----------+
| jank_perc |         baseline_60hz          |  4.0  |   0.0%    |
| jank_perc | pelt_rampup_runtime_shift_60hz |  1.5  |  -64.04%  |
+-----------+--------------------------------+-------+-----------+

Meaning it's a middle ground of sorts - instead of a 90% increase in
power usage it's 'just' 19%. At the same time though the fastest PELT
multiplier (pelt_4) was getting better max frame durations (85ms vs
108ms) for about half the power increase (9.6% vs 18.6%).