[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAdo7OADEFuMeg1PpO=rk=bXmiw1Avj7frsoNWZuceewA@mail.gmail.com>
Date: Wed, 11 Dec 2024 14:25:14 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Christian Loehle <christian.loehle@....com>, "Rafael J. Wysocki" <rjw@...ysocki.net>,
Linux PM <linux-pm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
Lukasz Luba <lukasz.luba@....com>, Peter Zijlstra <peterz@...radead.org>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
Dietmar Eggemann <dietmar.eggemann@....com>, Morten Rasmussen <morten.rasmussen@....com>,
Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
Pierre Gondois <pierre.gondois@....com>
Subject: Re: [RFC][PATCH v021 4/9] sched/topology: Adjust cpufreq checks for EAS
On Wed, 11 Dec 2024 at 12:29, Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Wed, Dec 11, 2024 at 11:33 AM Christian Loehle
> <christian.loehle@....com> wrote:
> >
> > On 11/29/24 16:00, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > >
> > > Make it possible to use EAS with cpufreq drivers that implement the
> > > :setpolicy() callback instead of using generic cpufreq governors.
> > >
> > > This is going to be necessary for using EAS with intel_pstate in its
> > > default configuration.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > > ---
> > >
> > > This is the minimum of what's needed, but I'd really prefer to move
> > > the cpufreq vs EAS checks into cpufreq because messing around cpufreq
> > > internals in topology.c feels like a butcher shop kind of exercise.
> >
> > Makes sense, something like cpufreq_eas_capable().
> >
> > >
> > > Besides, as I said before, I remain unconvinced about the usefulness
> > > of these checks at all. Yes, one is supposed to get the best results
> > > from EAS when running schedutil, but what if they just want to try
> > > something else with EAS? What if they can get better results with
> > > that other thing, surprisingly enough?
> >
> > How do you imagine this to work then?
> > I assume we don't make any 'resulting-OPP-guesses' like
> > sugov_effective_cpu_perf() for any of the setpolicy governors.
> > Neither for dbs and I guess userspace.
> > What about standard powersave and performance?
> > Do we just have a cpufreq callback to ask which OPP to use for
> > the energy calculation? Assume lowest/highest?
> > (I don't think there is hardware where lowest/highest makes a
> > difference, so maybe not bothering with the complexity could
> > be an option, too.)
>
> In the "setpolicy" case there is no way to reliably predict the OPP
> that is going to be used, so why bother?
>
> In the other cases, and if the OPPs are actually known, EAS may still
> make assumptions regarding which of them will be used that will match
> the schedutil selection rules, but if the cpufreq governor happens to
> choose a different OPP, this is not the end of the world.
Should we add a new cpufreq governor fops to return the guest estimate
of the compute capacity selection ? something like
cpufreq_effective_cpu_perf(cpu, actual, min, max)
EAS needs to estimate what would be the next OPP; schedutil uses
sugov_effective_cpu_perf() and other governor could provide their own
>
> Yes, you could have been more energy-efficient had you chosen to use
> schedutil, but you chose otherwise and that's what you get.
Calling sugov_effective_cpu_perf() for another governor than schedutil
doesn't make sense. and do we handle the case when
CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not selected
>
> > >
> > > ---
> > > kernel/sched/topology.c | 10 +++++++---
> > > 1 file changed, 7 insertions(+), 3 deletions(-)
> > >
> > > Index: linux-pm/kernel/sched/topology.c
> > > ===================================================================
> > > --- linux-pm.orig/kernel/sched/topology.c
> > > +++ linux-pm/kernel/sched/topology.c
> > > @@ -217,6 +217,7 @@ static bool sched_is_eas_possible(const
> > > bool any_asym_capacity = false;
> > > struct cpufreq_policy *policy;
> > > struct cpufreq_governor *gov;
> > > + bool cpufreq_ok;
> > > int i;
> > >
> > > /* EAS is enabled for asymmetric CPU capacity topologies. */
> > > @@ -251,7 +252,7 @@ static bool sched_is_eas_possible(const
> > > return false;
> > > }
> > >
> > > - /* Do not attempt EAS if schedutil is not being used. */
> > > + /* Do not attempt EAS if cpufreq is not configured adequately */
> > > for_each_cpu(i, cpu_mask) {
> > > policy = cpufreq_cpu_get(i);
> > > if (!policy) {
> > > @@ -261,11 +262,14 @@ static bool sched_is_eas_possible(const
> > > }
> > > return false;
> > > }
> > > + /* Require schedutil or a "setpolicy" driver */
> > > gov = policy->governor;
> > > + cpufreq_ok = gov == &schedutil_gov ||
> > > + (!gov && policy->policy != CPUFREQ_POLICY_UNKNOWN);
> > > cpufreq_cpu_put(policy);
> > > - if (gov != &schedutil_gov) {
> > > + if (!cpufreq_ok) {
> > > if (sched_debug()) {
> > > - pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
> > > + pr_info("rd %*pbl: Checking EAS, unsuitable cpufreq governor\n",
> > > cpumask_pr_args(cpu_mask));
> > > }
> > > return false;
> >
> > The logic here looks fine to me FWIW.
> >
> >
Powered by blists - more mailing lists