[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2b0953b5-4978-446a-b686-5b8d1541a265@arm.com>
Date: Mon, 18 Nov 2024 17:34:01 +0100
From: Pierre Gondois <pierre.gondois@....com>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>,
Linux PM <linux-pm@...r.kernel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Lukasz Luba <lukasz.luba@....com>,
Peter Zijlstra <peterz@...radead.org>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
Len Brown <len.brown@...el.com>, Dietmar Eggemann
<dietmar.eggemann@....com>, Morten Rasmussen <morten.rasmussen@....com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
Christian Loehle <Christian.Loehle@....com>
Subject: Re: [RFC][PATCH v0.1 6/6] cpufreq: intel_pstate: Add basic EAS
support on hybrid platforms
On 11/8/24 17:46, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>
> Modify intel_pstate to register stub EM perf domains for CPUs on
> hybrid platforms via em_dev_register_perf_domain() and to use
> em_dev_expand_perf_domain() introduced previously for adding new
> CPUs to existing EM perf domains when those CPUs become online for
> the first time after driver initialization.
>
> This change is targeting platforms (for example, Lunar Lake) where
> "small" CPUs (E-cores) are always more energy-efficient than the "big"
> or "performance" CPUs (P-cores) when run at the same HWP performance
> level, so it is sufficient to tell the EAS that E-cores are always
> preferred (so long as there is enough spare capacity on one of them
> to run the given task).
>
> Accordingly, the perf domains are registered per CPU type (that is,
> all P-cores belong to one perf domain and all E-cores belong to another
> perf domain) and they are registered only if asymmetric CPU capacity is
> enabled. Each perf domain has a one-element states table and that
> element only contains the relative cost value (the other fields in
> it are not initialized, so they are all equal to zero), and the cost
> value for the E-core perf domain is lower.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
> drivers/cpufreq/intel_pstate.c | 110 ++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 104 insertions(+), 6 deletions(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -8,6 +8,7 @@
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/energy_model.h>
> #include <linux/kernel.h>
> #include <linux/kernel_stat.h>
> #include <linux/module.h>
> @@ -938,6 +939,12 @@ static struct freq_attr *hwp_cpufreq_att
> NULL,
> };
>
> +enum hybrid_cpu_type {
> + HYBRID_PCORE = 0,
> + HYBRID_ECORE,
> + HYBRID_NR_TYPES
> +};
> +
> static struct cpudata *hybrid_max_perf_cpu __read_mostly;
> /*
> * Protects hybrid_max_perf_cpu, the capacity_perf fields in struct cpudata,
> @@ -945,6 +952,86 @@ static struct cpudata *hybrid_max_perf_c
> */
> static DEFINE_MUTEX(hybrid_capacity_lock);
>
> +#ifdef CONFIG_ENERGY_MODEL
> +struct hybrid_em_perf_domain {
> + cpumask_t cpumask;
> + struct device *dev;
> + struct em_data_callback cb;
> +};
> +
> +static int hybrid_pcore_cost(struct device *dev, unsigned long freq,
> + unsigned long *cost)
> +{
> + /*
> + * The number used here needs to be higher than the analogous
> + * one in hybrid_ecore_cost() below. The units and the actual
> + * values don't matter.
> + */
> + *cost = 2;
> + return 0;
> +}
> +
> +static int hybrid_ecore_cost(struct device *dev, unsigned long freq,
> + unsigned long *cost)
> +{
> + *cost = 1;
> + return 0;
> +}
The artificial EM was introduced for CPPC based platforms since these platforms
only provide an 'efficiency class' entry to describe the relative efficiency
of CPUs. The case seems similar to yours.
'Fake' OPPs were created to have an incentive for EAS to balance the load on
the CPUs in one perf. domain. Indeed, in feec(), during the energy
computation of a pd, if the cost is independent from the max_util value,
then one CPU in the pd could end up having a high util, and another CPU a
NULL util.
For CPPC platforms, this was problematic as a lower OPP could have been
selected for the same load, so energy was lost for no reason.
In your case it seems that the OPP selection is done independently on each
CPU. However I assume it is still more energy efficient to have 2 CPUs
loaded at 50% than one CPU loaded at 100% and an idle CPU.
Also as Dietmar suggested, maybe it would make sense to have some
way to prefer an CPU with a "energy saving" HFI configuration than
a similar CPU with a "performance" HFI configuration.
Also, out of curiosity, do you have energy numbers to share ?
Regards,
Pierre
Powered by blists - more mailing lists