linux-kernel - Re: [PATCH v8 00/23] Introduce runtime modifiable Energy Model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0hxubw0VvzTikEwMeS0JQEx=YTqdbhOLhu+Q_n6u4i5gQ@mail.gmail.com>
Date: Thu, 8 Feb 2024 15:01:58 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Lukasz Luba <lukasz.luba@....com>
Cc: linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org, rafael@...nel.org, 
	dietmar.eggemann@....com, rui.zhang@...el.com, amit.kucheria@...durent.com, 
	amit.kachhap@...il.com, daniel.lezcano@...aro.org, viresh.kumar@...aro.org, 
	len.brown@...el.com, pavel@....cz, mhiramat@...nel.org, qyousef@...alina.io, 
	wvw@...gle.com, xuewen.yan94@...il.com
Subject: Re: [PATCH v8 00/23] Introduce runtime modifiable Energy Model

On Thu, Feb 8, 2024 at 12:56 PM Lukasz Luba <lukasz.luba@....com> wrote:
>
> Hi all,
>
> This patch set adds a new feature which allows to modify Energy Model (EM)
> power values at runtime. It will allow to better reflect power model of
> a recent SoCs and silicon. Different characteristics of the power usage
> can be leveraged and thus better decisions made during task placement in EAS.
>
> It also optimizes the EAS code hot path, by removing 2 division and 1
> multiplication operations in the em_cpu_energy(). Speed-up results:
> the em_cpu_energy() should run faster on the Big CPU by 1.43x and on the
> Little CPU by 1.69x (mainline board RockPi 4B).
>
> This patch set is part of feature set known as Dynamic Energy Model. It has been
> presented and discussed recently at OSPM2023 [3].
>
>
> The concepts:
> 1. The CPU power usage can vary due to the workload that it's running or due
> to the temperature of the SoC. The same workload can use more power when the
> temperature of the silicon has increased (e.g. due to hot GPU or ISP).
> In such situation the EM can be adjusted and reflect the fact of increased
> power usage. That power increase is due to static power
> (sometimes called simply: leakage). The CPUs in recent SoCs are different.
> We have heterogeneous SoCs with 3 (or even 4) different microarchitectures.
> They are also built differently with High Performance (HP) cells or
> Low Power (LP) cells. They are affected by the temperature increase
> differently: HP cells have bigger leakage. The SW model can leverage that
> knowledge.
>
> 2. It is also possible to change the EM to better reflect the currently
> running workload. Usually the EM is derived from some average power values
> taken from experiments with benchmark (e.g. Dhrystone). The model derived
> from such scenario might not represent properly the workloads usually running
> on the device. Therefore, runtime modification of the EM allows to switch to
> a different model, when there is a need.
>
> 3. The EM can be adjusted after boot, when all the modules are loaded and
> more information about the SoC is available e.g. chip binning. This would help
> to better reflect the silicon characteristics. Thus, this EM modification
> API allows it now. It wasn't possible in the past and the EM had to be
> 'set in stone'.
>
> Example of such runtime modification after boot can be found in a follow-up
> patch set. It adds the OPP API and usage in Exynos5 SoC driver after the
> voltage values has been adjusted and power changes [5].
>
> More detailed explanation and background can be found in presentations
> during LPC2022 [1][2] or in the documentation patches.
>
> Some test results:
> The EM can be updated to fit better the workload type. In the case below the EM
> has been updated for the Jankbench test on Pixel6 (running v5.18 w/ mainline backports
> for the scheduler bits). The Jankbench was run 10 times for those two configurations,
> to get more reliable data.
>
> 1. Janky frames percentage
> +--------+-----------------+---------------------+-------+-----------+
> | metric |    variable     |       kernel        | value | perc_diff |
> +--------+-----------------+---------------------+-------+-----------+
> | gmean  | jank_percentage | EM_default          |  2.0  |   0.0%    |
> | gmean  | jank_percentage | EM_modified_runtime |  1.3  |  -35.33%  |
> +--------+-----------------+---------------------+-------+-----------+
>
> 2. Avg frame render time duration
> +--------+---------------------+---------------------+-------+-----------+
> | metric |      variable       |       kernel        | value | perc_diff |
> +--------+---------------------+---------------------+-------+-----------+
> | gmean  | mean_frame_duration | EM_default          | 10.5  |   0.0%    |
> | gmean  | mean_frame_duration | EM_modified_runtime |  9.6  |  -8.52%   |
> +--------+---------------------+---------------------+-------+-----------+
>
> 3. Max frame render time duration
> +--------+--------------------+---------------------+-------+-----------+
> | metric |      variable      |       kernel        | value | perc_diff |
> +--------+--------------------+---------------------+-------+-----------+
> | gmean  | max_frame_duration | EM_default          | 251.6 |   0.0%    |
> | gmean  | max_frame_duration | EM_modified_runtime | 115.5 |  -54.09%  |
> +--------+--------------------+---------------------+-------+-----------+
>
> 4. OS overutilized state percentage (when EAS is not working)
> +--------------+---------------------+------+------------+------------+
> |    metric    |       wa_path       | time | total_time | percentage |
> +--------------+---------------------+------+------------+------------+
> | overutilized | EM_default          | 1.65 |   253.38   |    0.65    |
> | overutilized | EM_modified_runtime | 1.4  |   277.5    |    0.51    |
> +--------------+---------------------+------+------------+------------+
>
> 5. All CPUs (Little+Mid+Big) power values in mW
> +------------+--------+---------------------+-------+-----------+
> |  channel   | metric |       kernel        | value | perc_diff |
> +------------+--------+---------------------+-------+-----------+
> |    CPU     | gmean  | EM_default          | 142.1 |   0.0%    |
> |    CPU     | gmean  | EM_modified_runtime | 131.8 |  -7.27%   |
> +------------+--------+---------------------+-------+-----------+
>
> The time cost to update the EM decreased in this v5 vs v4:
> big: 5us vs 2us -> 2.6x faster
> mid: 9us vs 3us -> 3x faster
> little: 16us vs 16us -> no change
>
> We still have to update the inefficiency in the cpufreq framework, thus
> a bit of overhead will be there.
>
> These series is based on linux next tree, tag 'v6.8-rc3'.
>
> Changelog:
> v8:
> - addressed cosmetic comments (Hongyan, Dietmar)
> - collected all reviewed-by and tested-by tags (Hongyan, Dietmar)
> - re-based on top of v6.8-rc3 (Rafael)
> v7 [6]:
> - dropped em_table_get/put() (Rafael)
> - renamed memory function to em_table_alloc/free() (Rafael)
> - use explicit rcu_read_lock/unlock() instead of wrappers and aligned
>   frameworks & drivers using EM (Rafael)
> - adjusted documentation to the new functions
> - fixed doxygen comments (Rafael)
> - renamed 'refcount' to 'kref' (Rafael)
> - changed patch headers according to comments (Rafael)
> - rebased on 'next-20240112' to get Ingo's revert affecting energy_model.h
> v6 can be found here [4]
>
> Regards,
> Lukasz Luba
>
> [1] https://lpc.events/event/16/contributions/1341/attachments/955/1873/Dynamic_Energy_Model_to_handle_leakage_power.pdf
> [2] https://lpc.events/event/16/contributions/1194/attachments/1114/2139/LPC2022_Energy_model_accuracy.pdf
> [3] https://www.youtube.com/watch?v=2C-5uikSbtM&list=PL0fKordpLTjKsBOUcZqnzlHShri4YBL1H
> [4] https://lore.kernel.org/lkml/20240104171553.2080674-1-lukasz.luba@armcom/
> [5] https://lore.kernel.org/lkml/20231220110339.1065505-1-lukasz.luba@armcom/
> [6] https://lore.kernel.org/lkml/20240117095714.1524808-1-lukasz.luba@armcom/
>
> Lukasz Luba (23):
>   PM: EM: Add missing newline for the message log
>   PM: EM: Extend em_cpufreq_update_efficiencies() argument list
>   PM: EM: Find first CPU active while updating OPP efficiency
>   PM: EM: Refactor em_pd_get_efficient_state() to be more flexible
>   PM: EM: Introduce em_compute_costs()
>   PM: EM: Check if the get_cost() callback is present in
>     em_compute_costs()
>   PM: EM: Split the allocation and initialization of the EM table
>   PM: EM: Introduce runtime modifiable table
>   PM: EM: Use runtime modified EM for CPUs energy estimation in EAS
>   PM: EM: Add functions for memory allocations for new EM tables
>   PM: EM: Introduce em_dev_update_perf_domain() for EM updates
>   PM: EM: Add em_perf_state_from_pd() to get performance states table
>   PM: EM: Add performance field to struct em_perf_state and optimize
>   PM: EM: Support late CPUs booting and capacity adjustment
>   PM: EM: Optimize em_cpu_energy() and remove division
>   powercap/dtpm_cpu: Use new Energy Model interface to get table
>   powercap/dtpm_devfreq: Use new Energy Model interface to get table
>   drivers/thermal/cpufreq_cooling: Use new Energy Model interface
>   drivers/thermal/devfreq_cooling: Use new Energy Model interface
>   PM: EM: Change debugfs configuration to use runtime EM table data
>   PM: EM: Remove old table
>   PM: EM: Add em_dev_compute_costs()
>   Documentation: EM: Update with runtime modification design
>
>  Documentation/power/energy-model.rst | 183 ++++++++++-
>  drivers/powercap/dtpm_cpu.c          |  39 ++-
>  drivers/powercap/dtpm_devfreq.c      |  34 +-
>  drivers/thermal/cpufreq_cooling.c    |  45 ++-
>  drivers/thermal/devfreq_cooling.c    |  49 ++-
>  include/linux/energy_model.h         | 166 ++++++----
>  kernel/power/energy_model.c          | 474 +++++++++++++++++++++++----
>  7 files changed, 821 insertions(+), 169 deletions(-)
>
> --

All applied as 6.9 material, thanks!