lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCg9aYnWsReT=xtznwkMMMEepj6j9z4J6_ST5oUv69aUA@mail.gmail.com>
Date:   Mon, 16 Oct 2023 17:32:03 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Ionela Voinescu <ionela.voinescu@....com>
Cc:     Pierre Gondois <pierre.gondois@....com>, linux@...linux.org.uk,
        catalin.marinas@....com, will@...nel.org, paul.walmsley@...ive.com,
        palmer@...belt.com, aou@...s.berkeley.edu, sudeep.holla@....com,
        gregkh@...uxfoundation.org, rafael@...nel.org, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        viresh.kumar@...aro.org, lukasz.luba@....com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-riscv@...ts.infradead.org, linux-pm@...r.kernel.org,
        conor.dooley@...rochip.com, suagrfillet@...il.com,
        ajones@...tanamicro.com, lftan@...nel.org
Subject: Re: [PATCH v2 6/6] cpufreq/cppc: set the frequency used for capacity computation

Hi Ionela,

On Mon, 16 Oct 2023 at 14:13, Ionela Voinescu <ionela.voinescu@....com> wrote:
>
> Hi both,
>
> On Wednesday 11 Oct 2023 at 16:25:46 (+0200), Vincent Guittot wrote:
> > On Wed, 11 Oct 2023 at 12:27, Pierre Gondois <pierre.gondois@....com> wrote:
> > >
> > > Hello Vincent,
> > >
> > > On 10/9/23 12:36, Vincent Guittot wrote:
> > > > cppc cpufreq driver can register an artificial energy model. In such case,
> > > > it also have to register the frequency that is used to define the CPU
> > > > capacity
> > > >
> > > > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > > > ---
> > > >   drivers/cpufreq/cppc_cpufreq.c | 18 ++++++++++++++++++
> > > >   1 file changed, 18 insertions(+)
> > > >
> > > > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> > > > index fe08ca419b3d..24c6ba349f01 100644
> > > > --- a/drivers/cpufreq/cppc_cpufreq.c
> > > > +++ b/drivers/cpufreq/cppc_cpufreq.c
> > > > @@ -636,6 +636,21 @@ static int populate_efficiency_class(void)
> > > >       return 0;
> > > >   }
> > > >
> > > > +
> > > > +static void cppc_cpufreq_set_capacity_ref_freq(struct cpufreq_policy *policy)
> > > > +{
> > > > +     struct cppc_perf_caps *perf_caps;
> > > > +     struct cppc_cpudata *cpu_data;
> > > > +     unsigned int ref_freq;
> > > > +
> > > > +     cpu_data = policy->driver_data;
> > > > +     perf_caps = &cpu_data->perf_caps;
> > > > +
> > > > +     ref_freq = cppc_cpufreq_perf_to_khz(cpu_data, perf_caps->highest_perf);
> > > > +
> > > > +     per_cpu(capacity_ref_freq, policy->cpu) = ref_freq;
> > >
> > > 'capacity_ref_freq' seems to be updated only if CONFIG_ENERGY_MODEL is set. However in
> > > [1], get_capacity_ref_freq() relies on 'capacity_ref_freq'. The cpufreq_schedutil governor
> > > should have a valid 'capacity_ref_freq' value set if the CPPC cpufreq driver is used
> > > without energy model I believe.
> >
> > we can disable it by setting capacity_ref_freq to 0 so it will
> > fallback on cpuinfo like intel and amd which uses default
> > SCHED_CAPACITY_SCALE capacity
> >
> > Could you provide me with more details about your platform ? I still
> > try to understand how the cpu compute capacity is set up on your
> > system. How do you set per_cpu cpu_scale variable ? we should set the
> > ref freq at the same time
> >
>
> Yes, the best place to set it would be in:
> drivers/base/arch_topology.c: topology_init_cpu_capacity_cppc()

Thanks. I didn't notice it

>
> But:
>  - That function reuses topology_normalize_cpu_scale() and when called
>    it needs to have capacity_ref_freq = 1. So either capacity_ref_freq
>    needs to be set for each CPU after topology_normalize_cpu_scale() is
>    called or we should not call topology_normalize_cpu_scale() here and
>    just unpack a CPPC specific version of it in
>    topology_init_cpu_capacity_cppc(). The latter is probably better as
>    we avoid iterating through all CPUs a couple of times.
>
>  - When set, capacity_ref_freq needs to be a "frequency" (at least
>    in reference to the reference frequencies provided by CPPC). So
>    cppc_cpufreq_khz_to_perf() and cppc_cpufreq_perf_to_khz() would need
>    to move to drivers/acpi/cppc_acpi.c. They don't have any dependency
>    on cpufreq (policies) so that should be alright.
>
> topology_init_cpu_capacity_cppc() is a better place to set
> capacity_ref_freq because one can do it for each CPU, and it not only

I agree, topology_init_cpu_capacity_cppc() is the best place to set
capacity_ref_freq()

> caters for the EAS case but also for frequency invariance, when
> arch_set_freq_scale() is called, if no counters are supported.
>
> When counters are supported, there are still two loose threads:
>  - amu_fie_setup(): Vincent, would you mind completely removing
>    cpufreq_get_hw_max_freq() and reusing arch_scale_freq_ref() here?

I wonder if we can have a ordering dependency problem as both
init_cpu_capacity_notifier() and init_amu_fie_notifier() are
registered for the same CPUFREQ_POLICY_NOTIFIER event and I'm not sure
 it will happen in the right ordering

>
>  - It would be nice if cppc_scale_freq_workfn() would use
>    arch_scale_freq_ref() as well, for consistency. But it would need
>    to be converted back to performance before use, so that would mean
>    extra work on the tick, which is not ideal.

This once seems more complex as it implies other arch that are not
using arch_topology.c and would need more rework so I would prefer to
make it a separate patchset

Thanks
Vincent

>
> Basically it would be good if what gets used for capacity
> (arch_scale_freq_ref()) gets used for frequency invariance as well,
> in all locations.
>
> Thanks,
> Ionela.
>
> > >
> > > Also 'capacity_ref_freq' seems to be set only for 'policy->cpu'. I believe it should
> > > be set for the whole perf domain in case this 'policy->cpu' goes offline.
> > >
> > > Another thing, related my comment to [1] and to [2], for CPPC the max capacity matches
> > > the boosting frequency. We have:
> > >    'non-boosted max capacity' < 'boosted max capacity'.
> > > -
> > > If boosting is not enabled, the CPU utilization can still go above the 'non-boosted max
> > > capacity'. The overutilization of the system seems to be triggered by comparing the CPU
> > > util to the 'boosted max capacity'. So systems might not be detected as overutilized.
> >
> > As Peter mentioned, we have to decide what is the original compute
> > capacity of your CPUs which is usually the sustainable max compute
> > capacity, especially when using EAS and EM
> >
> > >
> > > For the EAS energy computation, em_cpu_energy() tries to predict the frequency that will
> > > be used. It is currently unknown to the function that the frequency request will be
> > > clamped by __resolve_freq():
> > > get_next_freq()
> > > \-cpufreq_driver_resolve_freq()
> > >    \-__resolve_freq()
> > > This means that the energy computation might use boosting frequencies, which are not
> > > available.
> > >
> > > Regards,
> > > Pierre
> > >
> > > [1]: [PATCH v2 4/6] cpufreq/schedutil: use a fixed reference frequency
> > > [2]: https://lore.kernel.org/lkml/20230905113308.GF28319@noisy.programming.kicks-ass.net/
> > >
> > > > +}
> > > > +
> > > >   static void cppc_cpufreq_register_em(struct cpufreq_policy *policy)
> > > >   {
> > > >       struct cppc_cpudata *cpu_data;
> > > > @@ -643,6 +658,9 @@ static void cppc_cpufreq_register_em(struct cpufreq_policy *policy)
> > > >               EM_ADV_DATA_CB(cppc_get_cpu_power, cppc_get_cpu_cost);
> > > >
> > > >       cpu_data = policy->driver_data;
> > > > +
> > > > +     cppc_cpufreq_set_capacity_ref_freq(policy);
> > > > +
> > > >       em_dev_register_perf_domain(get_cpu_device(policy->cpu),
> > > >                       get_perf_level_count(policy), &em_cb,
> > > >                       cpu_data->shared_cpu_map, 0);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ