linux-kernel - Re: [RFC][PATCH v021 4/9] sched/topology: Adjust cpufreq checks for EAS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAdo7OADEFuMeg1PpO=rk=bXmiw1Avj7frsoNWZuceewA@mail.gmail.com>
Date: Wed, 11 Dec 2024 14:25:14 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Christian Loehle <christian.loehle@....com>, "Rafael J. Wysocki" <rjw@...ysocki.net>, 
	Linux PM <linux-pm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
	Lukasz Luba <lukasz.luba@....com>, Peter Zijlstra <peterz@...radead.org>, 
	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Morten Rasmussen <morten.rasmussen@....com>, 
	Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>, 
	Pierre Gondois <pierre.gondois@....com>
Subject: Re: [RFC][PATCH v021 4/9] sched/topology: Adjust cpufreq checks for EAS

On Wed, 11 Dec 2024 at 12:29, Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Wed, Dec 11, 2024 at 11:33 AM Christian Loehle
> <christian.loehle@....com> wrote:
> >
> > On 11/29/24 16:00, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > >
> > > Make it possible to use EAS with cpufreq drivers that implement the
> > > :setpolicy() callback instead of using generic cpufreq governors.
> > >
> > > This is going to be necessary for using EAS with intel_pstate in its
> > > default configuration.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > > ---
> > >
> > > This is the minimum of what's needed, but I'd really prefer to move
> > > the cpufreq vs EAS checks into cpufreq because messing around cpufreq
> > > internals in topology.c feels like a butcher shop kind of exercise.
> >
> > Makes sense, something like cpufreq_eas_capable().
> >
> > >
> > > Besides, as I said before, I remain unconvinced about the usefulness
> > > of these checks at all.  Yes, one is supposed to get the best results
> > > from EAS when running schedutil, but what if they just want to try
> > > something else with EAS?  What if they can get better results with
> > > that other thing, surprisingly enough?
> >
> > How do you imagine this to work then?
> > I assume we don't make any 'resulting-OPP-guesses' like
> > sugov_effective_cpu_perf() for any of the setpolicy governors.
> > Neither for dbs and I guess userspace.
> > What about standard powersave and performance?
> > Do we just have a cpufreq callback to ask which OPP to use for
> > the energy calculation? Assume lowest/highest?
> > (I don't think there is hardware where lowest/highest makes a
> > difference, so maybe not bothering with the complexity could
> > be an option, too.)
>
> In the "setpolicy" case there is no way to reliably predict the OPP
> that is going to be used, so why bother?
>
> In the other cases, and if the OPPs are actually known, EAS may still
> make assumptions regarding which of them will be used that will match
> the schedutil selection rules, but if the cpufreq governor happens to
> choose a different OPP, this is not the end of the world.

Should we add a new cpufreq governor fops to return the guest estimate
of the compute capacity selection ? something like
cpufreq_effective_cpu_perf(cpu, actual, min, max)
EAS needs to estimate what would be the next OPP; schedutil uses
sugov_effective_cpu_perf() and other governor could provide their own

>
> Yes, you could have been more energy-efficient had you chosen to use
> schedutil, but you chose otherwise and that's what you get.

Calling sugov_effective_cpu_perf() for another governor than schedutil
doesn't make sense. and do we handle the case when
CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not selected

>
> > >
> > > ---
> > >  kernel/sched/topology.c |   10 +++++++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > >
> > > Index: linux-pm/kernel/sched/topology.c
> > > ===================================================================
> > > --- linux-pm.orig/kernel/sched/topology.c
> > > +++ linux-pm/kernel/sched/topology.c
> > > @@ -217,6 +217,7 @@ static bool sched_is_eas_possible(const
> > >       bool any_asym_capacity = false;
> > >       struct cpufreq_policy *policy;
> > >       struct cpufreq_governor *gov;
> > > +     bool cpufreq_ok;
> > >       int i;
> > >
> > >       /* EAS is enabled for asymmetric CPU capacity topologies. */
> > > @@ -251,7 +252,7 @@ static bool sched_is_eas_possible(const
> > >               return false;
> > >       }
> > >
> > > -     /* Do not attempt EAS if schedutil is not being used. */
> > > +     /* Do not attempt EAS if cpufreq is not configured adequately */
> > >       for_each_cpu(i, cpu_mask) {
> > >               policy = cpufreq_cpu_get(i);
> > >               if (!policy) {
> > > @@ -261,11 +262,14 @@ static bool sched_is_eas_possible(const
> > >                       }
> > >                       return false;
> > >               }
> > > +             /* Require schedutil or a "setpolicy" driver */
> > >               gov = policy->governor;
> > > +             cpufreq_ok = gov == &schedutil_gov ||
> > > +                             (!gov && policy->policy != CPUFREQ_POLICY_UNKNOWN);
> > >               cpufreq_cpu_put(policy);
> > > -             if (gov != &schedutil_gov) {
> > > +             if (!cpufreq_ok) {
> > >                       if (sched_debug()) {
> > > -                             pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
> > > +                             pr_info("rd %*pbl: Checking EAS, unsuitable cpufreq governor\n",
> > >                                       cpumask_pr_args(cpu_mask));
> > >                       }
> > >                       return false;
> >
> > The logic here looks fine to me FWIW.
> >
> >