[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0hH424_4N1TZVVgKCegUsAisjdAXr7KekafJteSSLEnHA@mail.gmail.com>
Date: Tue, 19 Nov 2024 18:20:07 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Pierre Gondois <pierre.gondois@....com>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, Linux PM <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, Lukasz Luba <lukasz.luba@....com>,
Peter Zijlstra <peterz@...radead.org>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>, Len Brown <len.brown@...el.com>,
Dietmar Eggemann <dietmar.eggemann@....com>, Morten Rasmussen <morten.rasmussen@....com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
Christian Loehle <Christian.Loehle@....com>
Subject: Re: [RFC][PATCH v0.1 6/6] cpufreq: intel_pstate: Add basic EAS
support on hybrid platforms
On Mon, Nov 18, 2024 at 5:34 PM Pierre Gondois <pierre.gondois@....com> wrote:
>
>
>
> On 11/8/24 17:46, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > Modify intel_pstate to register stub EM perf domains for CPUs on
> > hybrid platforms via em_dev_register_perf_domain() and to use
> > em_dev_expand_perf_domain() introduced previously for adding new
> > CPUs to existing EM perf domains when those CPUs become online for
> > the first time after driver initialization.
> >
> > This change is targeting platforms (for example, Lunar Lake) where
> > "small" CPUs (E-cores) are always more energy-efficient than the "big"
> > or "performance" CPUs (P-cores) when run at the same HWP performance
> > level, so it is sufficient to tell the EAS that E-cores are always
> > preferred (so long as there is enough spare capacity on one of them
> > to run the given task).
> >
> > Accordingly, the perf domains are registered per CPU type (that is,
> > all P-cores belong to one perf domain and all E-cores belong to another
> > perf domain) and they are registered only if asymmetric CPU capacity is
> > enabled. Each perf domain has a one-element states table and that
> > element only contains the relative cost value (the other fields in
> > it are not initialized, so they are all equal to zero), and the cost
> > value for the E-core perf domain is lower.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > ---
> > drivers/cpufreq/intel_pstate.c | 110 ++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 104 insertions(+), 6 deletions(-)
> >
> > Index: linux-pm/drivers/cpufreq/intel_pstate.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> > +++ linux-pm/drivers/cpufreq/intel_pstate.c
> > @@ -8,6 +8,7 @@
> >
> > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/energy_model.h>
> > #include <linux/kernel.h>
> > #include <linux/kernel_stat.h>
> > #include <linux/module.h>
> > @@ -938,6 +939,12 @@ static struct freq_attr *hwp_cpufreq_att
> > NULL,
> > };
> >
> > +enum hybrid_cpu_type {
> > + HYBRID_PCORE = 0,
> > + HYBRID_ECORE,
> > + HYBRID_NR_TYPES
> > +};
> > +
> > static struct cpudata *hybrid_max_perf_cpu __read_mostly;
> > /*
> > * Protects hybrid_max_perf_cpu, the capacity_perf fields in struct cpudata,
> > @@ -945,6 +952,86 @@ static struct cpudata *hybrid_max_perf_c
> > */
> > static DEFINE_MUTEX(hybrid_capacity_lock);
> >
> > +#ifdef CONFIG_ENERGY_MODEL
> > +struct hybrid_em_perf_domain {
> > + cpumask_t cpumask;
> > + struct device *dev;
> > + struct em_data_callback cb;
> > +};
> > +
> > +static int hybrid_pcore_cost(struct device *dev, unsigned long freq,
> > + unsigned long *cost)
> > +{
> > + /*
> > + * The number used here needs to be higher than the analogous
> > + * one in hybrid_ecore_cost() below. The units and the actual
> > + * values don't matter.
> > + */
> > + *cost = 2;
> > + return 0;
> > +}
> > +
> > +static int hybrid_ecore_cost(struct device *dev, unsigned long freq,
> > + unsigned long *cost)
> > +{
> > + *cost = 1;
> > + return 0;
> > +}
>
> The artificial EM was introduced for CPPC based platforms since these platforms
> only provide an 'efficiency class' entry to describe the relative efficiency
> of CPUs. The case seems similar to yours.
It is, but I don't particularly like the CPPC driver's approach to this.
> 'Fake' OPPs were created to have an incentive for EAS to balance the load on
> the CPUs in one perf. domain. Indeed, in feec(), during the energy
> computation of a pd, if the cost is independent from the max_util value,
> then one CPU in the pd could end up having a high util, and another CPU a
> NULL util.
Isn't this a consequence of disabling load balancing by EAS?
> For CPPC platforms, this was problematic as a lower OPP could have been
> selected for the same load, so energy was lost for no reason.
>
> In your case it seems that the OPP selection is done independently on each
> CPU. However I assume it is still more energy efficient to have 2 CPUs
> loaded at 50% than one CPU loaded at 100% and an idle CPU.
Maybe.
It really depends on the cost of the idle state etc.
> Also as Dietmar suggested, maybe it would make sense to have some
> way to prefer an CPU with a "energy saving" HFI configuration than
> a similar CPU with a "performance" HFI configuration.
As it happens, E-cores have higher energy-efficiency scores in HFI AFAICS.
> Also, out of curiosity, do you have energy numbers to share ?
Not yet, but there will be some going forward.
Thanks!
Powered by blists - more mailing lists