linux-kernel - Re: [PATCH v7 02/37] soc/tegra: pmc: Implement attach

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFpJhX51rOnvbYTmj9Akd+xX+b7xcSWt87UDrvMEfYOZ7Q@mail.gmail.com>
Date:   Tue, 10 Aug 2021 12:51:55 +0200
From:   Ulf Hansson <ulf.hansson@...aro.org>
To:     Dmitry Osipenko <digetx@...il.com>
Cc:     Thierry Reding <thierry.reding@...il.com>,
        Jonathan Hunter <jonathanh@...dia.com>,
        Viresh Kumar <vireshk@...nel.org>,
        Stephen Boyd <sboyd@...nel.org>,
        Peter De Schrijver <pdeschrijver@...dia.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-tegra <linux-tegra@...r.kernel.org>,
        Linux PM <linux-pm@...r.kernel.org>
Subject: Re: [PATCH v7 02/37] soc/tegra: pmc: Implement attach_dev() of power
 domain drivers

On Tue, 10 Aug 2021 at 01:56, Dmitry Osipenko <digetx@...il.com> wrote:
>
> 09.08.2021 17:15, Ulf Hansson пишет:
> >> We did that in a previous versions of this series where drivers were
> >> calling devm_tegra_core_dev_init_opp_table() helper during the probe to
> >> initialize performance state of the domain. Moving OPP state
> >> initialization into central place made drivers cleaner by removing the
> >> boilerplate code.
> > I am not against doing this in a central place, like $subject patch
> > suggests. As a matter of fact, it makes perfect sense to me.
> >
> > However, what I am concerned about, is that you require to use genpd
> > internal data structures to do it. I think we should try to avoid
> > that.
>
> Alright, what do you think about this:
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index a934c679e6ce..5faed62075e9 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -2669,12 +2669,37 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
>         dev->pm_domain->detach = genpd_dev_pm_detach;
>         dev->pm_domain->sync = genpd_dev_pm_sync;
>
> +       if (pd->default_performance_state) {
> +               unsigned int default_pstate;
> +
> +               ret = pd->default_performance_state(pd, dev);
> +               if (ret < 0) {
> +                       dev_err(dev, "failed to get default performance state for PM domain %s: %d\n",
> +                               pd->name, ret);
> +                       goto out;
> +               }

Adding a new callback seems reasonable to support this.

> +
> +               default_pstate = ret;
> +
> +               if (power_on) {
> +                       ret = dev_pm_genpd_set_performance_state(dev, default_pstate);

However, this is more questionable to me.

First, I don't think we should care about whether this is "power_on"
or not. At this point, performance states are treated orthogonal to
idle states in genpd. We may decide to change that in some way, but
that deserves a different change.

Second, I don't think we should call
dev_pm_genpd_set_performance_state() from here. It's probably better
handled from the genpd callback itself, if/when needed.

That said, perhaps the new callback should just return a regular error
code and zero on success, rather than the current performance state.
See more below.

> +                       if (ret) {
> +                               dev_err(dev, "failed to set default performance state %u for PM domain %s: %d\n",
> +                                       default_pstate, pd->name, ret);
> +                               goto out;
> +                       }
> +               } else {
> +                       dev_gpd_data(dev)->rpm_pstate = default_pstate;

No, this isn't the right thing to do.

It looks like you are trying to use the ->rpm_pstate for
synchronization with runtime PM for consumer drivers. This is fragile
as it depends on the runtime PM deployment in the consumer driver. I
think you should look at ->rpm_pstate as a variable solely for
managing save/restore of the performance state for the device, during
runtime suspend/resume in genpd.

Synchronization of a vote for a performance state for a device, needs
to be managed by calling dev_pm_genpd_set_performance_state() - or by
calling an OPP function that calls it, like dev_pm_opp_set_rate(), for
example.

> +               }
> +       }
> +
>         if (power_on) {
>                 genpd_lock(pd);
>                 ret = genpd_power_on(pd, 0);
>                 genpd_unlock(pd);
>         }
>
> +out:
>         if (ret)
>                 genpd_remove_device(pd, dev);
>
> diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
> index 81d1f019fa0c..9efb55f52462 100644
> --- a/drivers/soc/tegra/pmc.c
> +++ b/drivers/soc/tegra/pmc.c
> @@ -518,15 +518,14 @@ static const char * const tegra_emc_compats[] = {
>   * We retrieve clock rate of the attached device and initialize domain's
>   * performance state in accordance to the clock rate.
>   */
> -static int tegra_pmc_pd_attach_dev(struct generic_pm_domain *genpd,
> -                                  struct device *dev)
> +static int tegra_pmc_genpd_default_perf_state(struct generic_pm_domain *genpd,
> +                                             struct device *dev)
>  {
> -       struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev);
>         struct opp_table *opp_table, *pd_opp_table;
>         struct generic_pm_domain *core_genpd;
>         struct dev_pm_opp *opp, *pd_opp;
> -       unsigned long rate, state;
>         struct gpd_link *link;
> +       unsigned long rate;
>         struct clk *clk;
>         u32 hw_version;
>         int ret;
> @@ -633,8 +632,7 @@ static int tegra_pmc_pd_attach_dev(struct generic_pm_domain *genpd,
>          * RPM-resume of the device.  This means that drivers don't need to
>          * explicitly initialize performance state.
>          */
> -       state = pm_genpd_opp_to_performance_state(&core_genpd->dev, pd_opp);
> -       gpd_data->rpm_pstate = state;
> +       ret = pm_genpd_opp_to_performance_state(&core_genpd->dev, pd_opp);

I don't see how this avoids tegra_pmc_genpd_default_perf_state() from
having to walk &genpd->child_links.

That's still an issue, right?

>         dev_pm_opp_put(pd_opp);
>
>  put_pd_opp_table:
> @@ -1383,7 +1381,7 @@ static int tegra_powergate_add(struct tegra_pmc *pmc, struct device_node *np)
>
>         pg->id = id;
>         pg->genpd.name = np->name;
> -       pg->genpd.attach_dev = tegra_pmc_pd_attach_dev;
> +       pg->genpd.default_performance_state = tegra_pmc_genpd_default_perf_state;
>         pg->genpd.power_off = tegra_genpd_power_off;
>         pg->genpd.power_on = tegra_genpd_power_on;
>         pg->pmc = pmc;
> @@ -1500,7 +1498,7 @@ static int tegra_pmc_core_pd_add(struct tegra_pmc *pmc, struct device_node *np)
>                 return -ENOMEM;
>
>         genpd->name = np->name;
> -       genpd->attach_dev = tegra_pmc_pd_attach_dev;
> +       genpd->default_performance_state = tegra_pmc_genpd_default_perf_state;
>         genpd->set_performance_state = tegra_pmc_core_pd_set_performance_state;
>         genpd->opp_to_performance_state = tegra_pmc_core_pd_opp_to_performance_state;
>
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index 21a0577305ef..cd4867817ca5 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -143,6 +143,8 @@ struct generic_pm_domain {
>                           struct device *dev);
>         void (*detach_dev)(struct generic_pm_domain *domain,
>                            struct device *dev);
> +       int (*default_performance_state)(struct generic_pm_domain *domain,
> +                                        struct device *dev);
>         unsigned int flags;             /* Bit field of configs for genpd */
>         struct genpd_power_state *states;
>         void (*free_states)(struct genpd_power_state *states,
>
> >> I can revert back to the previous variant, although this variant works
> >> well too.
> > I looked at that code and in that path we end up calling
> > dev_pm_opp_set_rate(), after it has initialized the opp table for the
> > device.
> >
> > Rather than doing the OF parsing above to find out the current state
> > for the device, why can't you just call dev_pm_opp_set_rate() to
> > initialize a proper vote instead?
> >
>
> For some devices clock rate is either preset by bootloader, or by clk driver, or by assigned-clocks in a device-tree. And then I don't see what's the difference in comparison to initialization for the current rate.
>
> For some devices, like memory controller, we can't just change the clock rate because it's a complex procedure and some boards will use fixed rate, but the power vote still must be initialized.

I am not saying you should change the clock rate. The current code
path that runs via devm_tegra_core_dev_init_opp_table() just calls
clk_get_rate and then dev_pm_opp_set_rate() with the current rate to
vote for the corresponding OPP level. Right?

Isn't this exactly what you want? No?

Kind regards
Uffe