lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0g1MYxABDEYJxjveWALV6yecuV1=Ly6REkR4eb1kS-cUA@mail.gmail.com>
Date:   Fri, 29 Sep 2023 14:27:03 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Lukasz Luba <lukasz.luba@....com>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        dietmar.eggemann@....com, rui.zhang@...el.com,
        amit.kucheria@...durent.com, amit.kachhap@...il.com,
        daniel.lezcano@...aro.org, viresh.kumar@...aro.org,
        len.brown@...el.com, pavel@....cz, mhiramat@...nel.org,
        qyousef@...alina.io, wvw@...gle.com
Subject: Re: [PATCH v4 09/18] PM: EM: Introduce runtime modifiable table

On Fri, Sep 29, 2023 at 11:15 AM Lukasz Luba <lukasz.luba@....com> wrote:
>
>
>
> On 9/26/23 20:12, Rafael J. Wysocki wrote:
> > On Mon, Sep 25, 2023 at 10:11 AM Lukasz Luba <lukasz.luba@....com> wrote:
> >>
> >> The new runtime table would be populated with a new power data to better
> >> reflect the actual power. The power can vary over time e.g. due to the
> >> SoC temperature change. Higher temperature can increase power values.
> >> For longer running scenarios, such as game or camera, when also other
> >> devices are used (e.g. GPU, ISP) the CPU power can change. The new
> >> EM framework is able to addresses this issue and change the data
> >> at runtime safely.
> >>
> >> The runtime modifiable EM data is used by the Energy Aware Scheduler (EAS)
> >> for the task placement. All the other users (thermal, etc.) are still
> >> using the default (basic) EM. This fact drove the design of this feature.
> >>
> >> Signed-off-by: Lukasz Luba <lukasz.luba@....com>
> >> ---
> >>   include/linux/energy_model.h |  4 +++-
> >>   kernel/power/energy_model.c  | 12 +++++++++++-
> >>   2 files changed, 14 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> >> index 546dee90f716..740e7c25cfff 100644
> >> --- a/include/linux/energy_model.h
> >> +++ b/include/linux/energy_model.h
> >> @@ -39,7 +39,7 @@ struct em_perf_state {
> >>   /**
> >>    * struct em_perf_table - Performance states table
> >>    * @state:     List of performance states, in ascending order
> >> - * @rcu:       RCU used for safe access and destruction
> >> + * @rcu:       RCU used only for runtime modifiable table
> >
> > This still doesn't appear to be used anywhere, so why change it here?
>
> I will try to move this later in the series.
>
> >
> >>    */
> >>   struct em_perf_table {
> >>          struct em_perf_state *state;
> >> @@ -49,6 +49,7 @@ struct em_perf_table {
> >>   /**
> >>    * struct em_perf_domain - Performance domain
> >>    * @default_table:     Pointer to the default em_perf_table
> >> + * @runtime_table:     Pointer to the runtime modifiable em_perf_table
> >
> > "Pointer to em_perf_table that can be dynamically updated"
>
> OK
>
> >
> >>    * @nr_perf_states:    Number of performance states
> >>    * @flags:             See "em_perf_domain flags"
> >>    * @cpus:              Cpumask covering the CPUs of the domain. It's here
> >> @@ -64,6 +65,7 @@ struct em_perf_table {
> >>    */
> >>   struct em_perf_domain {
> >>          struct em_perf_table *default_table;
> >> +       struct em_perf_table __rcu *runtime_table;
> >>          int nr_perf_states;
> >>          unsigned long flags;
> >>          unsigned long cpus[];
> >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> >> index 797141638b29..5b40db38b745 100644
> >> --- a/kernel/power/energy_model.c
> >> +++ b/kernel/power/energy_model.c
> >> @@ -251,6 +251,9 @@ static int em_create_pd(struct device *dev, int nr_states,
> >>                  return ret;
> >>          }
> >>
> >> +       /* Initialize runtime table as default table. */
> >
> > Redundant comment.
>
> I'll drop it.
>
> >
> >> +       rcu_assign_pointer(pd->runtime_table, default_table);
> >> +
> >>          if (_is_cpu_device(dev))
> >>                  for_each_cpu(cpu, cpus) {
> >>                          cpu_dev = get_cpu_device(cpu);
> >> @@ -448,6 +451,7 @@ EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
> >>    */
> >>   void em_dev_unregister_perf_domain(struct device *dev)
> >>   {
> >> +       struct em_perf_table __rcu *runtime_table;
> >>          struct em_perf_domain *pd;
> >>
> >>          if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
> >> @@ -457,18 +461,24 @@ void em_dev_unregister_perf_domain(struct device *dev)
> >>                  return;
> >>
> >>          pd = dev->em_pd;
> >> -
> >
> > Unrelated change.
>
> ACK
>
> >
> >>          /*
> >>           * The mutex separates all register/unregister requests and protects
> >>           * from potential clean-up/setup issues in the debugfs directories.
> >>           * The debugfs directory name is the same as device's name.
> >>           */
> >>          mutex_lock(&em_pd_mutex);
> >> +
> >
> > Same here.
>
> ACK
>
> >
> >>          em_debug_remove_pd(dev);
> >>
> >> +       runtime_table = pd->runtime_table;
> >> +
> >> +       rcu_assign_pointer(pd->runtime_table, NULL);
> >> +       synchronize_rcu();
> >
> > Is it really a good idea to call this under a mutex?
>
> This is the unregistration of the EM code path, so a thermal
> driver which gets some IRQs might not be aware that the EM
> is going to vanish. That's why those two code paths: update
> & unregister are protected with the same lock.
>
> This synchronize_rcu() won't be long,

Are you sure?  This potentially waits for all of the CPUs in the
system to go through a quiescent state which may take a while in
principle.

In any case, though, this effectively makes everyone waiting for the
mutex also wait for the grace period to elapse and they may not care
about it.

> but makes sure that when we free(dev->em_pd) we don't leak runtime_table.
>
> >
> >> +
> >>          kfree(pd->default_table->state);
> >>          kfree(pd->default_table);
> >>          kfree(dev->em_pd);
> >> +
> >
> > Unrelated change.
>
> ACK
>
> >
> >>          dev->em_pd = NULL;
> >>          mutex_unlock(&em_pd_mutex);
> >>   }
> >> --
> >
> > So this really adds a pointer to a table that can be dynamically
> > updated to struct em_perf_domain without any users so far.  It is not
> > used anywhere as of this patch AFAICS, which is not what the changelog
> > is saying.
>
> Good catch. I will adjust the changlog in header and say:
>
> 'Add infrastructure and mechanisms for the new runtime table.
> The runtime modifiable EM data is used by the Energy Aware Scheduler
> (EAS)for the task placement.

I would make it more clear that this is going to happen after some
other subsequent changes.

> All the other users (thermal, etc.) are
> still using the default (basic) EM. This fact drove the design of this
> feature.'

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ