linux-kernel - Re: [PATCH v13 06/35] clk: tegra: Support runtime PM and power domain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFq+LS4Jr1GyC-a-tGWPzGH0JxfJ9wKY=uQEBGYm952azw@mail.gmail.com>
Date:   Fri, 1 Oct 2021 14:32:21 +0200
From:   Ulf Hansson <ulf.hansson@...aro.org>
To:     Dmitry Osipenko <digetx@...il.com>
Cc:     Thierry Reding <thierry.reding@...il.com>,
        Jonathan Hunter <jonathanh@...dia.com>,
        Viresh Kumar <vireshk@...nel.org>,
        Stephen Boyd <sboyd@...nel.org>,
        Peter De Schrijver <pdeschrijver@...dia.com>,
        Mikko Perttunen <mperttunen@...dia.com>,
        Peter Chen <peter.chen@...nel.org>,
        Lee Jones <lee.jones@...aro.org>,
        Uwe Kleine-König 
        <u.kleine-koenig@...gutronix.de>, Nishanth Menon <nm@...com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Michael Turquette <mturquette@...libre.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-tegra <linux-tegra@...r.kernel.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux USB List <linux-usb@...r.kernel.org>,
        linux-staging@...ts.linux.dev, linux-pwm@...r.kernel.org,
        linux-mmc <linux-mmc@...r.kernel.org>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        DTML <devicetree@...r.kernel.org>,
        linux-clk <linux-clk@...r.kernel.org>,
        Mark Brown <broonie@...nel.org>,
        Vignesh Raghavendra <vigneshr@...com>,
        Richard Weinberger <richard@....at>,
        Miquel Raynal <miquel.raynal@...tlin.com>,
        Lucas Stach <dev@...xeye.de>, Stefan Agner <stefan@...er.ch>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        David Heidelberg <david@...t.cz>
Subject: Re: [PATCH v13 06/35] clk: tegra: Support runtime PM and power domain

On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@...il.com> wrote:
>
> The Clock-and-Reset controller resides in a core power domain on NVIDIA
> Tegra SoCs.  In order to support voltage scaling of the core power domain,
> we hook up DVFS-capable clocks to the core GENPD for managing of the
> GENPD's performance state based on the clock changes.
>
> Some clocks don't have any specific physical hardware unit that backs
> them, like root PLLs and system clock and they have theirs own voltage
> requirements.  This patch adds new clk-device driver that backs the clocks
> and provides runtime PM functionality for them.  A virtual clk-device is
> created for each such DVFS-capable clock at the clock's registration time
> by the new tegra_clk_register() helper.  Driver changes clock's device
> GENPD performance state based on clk-rate notifications.
>
> In result we have this sequence of events:
>
>   1. Clock driver creates virtual device for selective clocks, enables
>      runtime PM for the created device and registers the clock.
>   2. Clk-device driver starts to listen to clock rate changes.
>   3. Something changes clk rate or enables/disables clk.
>   4. CCF core propagates the change through the clk tree.
>   5. Clk-device driver gets clock rate-change notification or GENPD core
>      handles prepare/unprepare of the clock.
>   6. Clk-device driver changes GENPD performance state on clock rate
>      change.
>   7. GENPD driver changes voltage regulator state change.
>   8. The regulator state is committed to hardware via I2C.
>
> We rely on fact that DVFS is not needed for Tegra I2C and that Tegra I2C
> driver already keeps clock always-prepared.  Hence I2C subsystem stays
> independent from the clk power management and there are no deadlock spots
> in the sequence.
>
> Currently all clocks are registered very early during kernel boot when the
> device driver core isn't available yet.  The clk-device can't be created
> at that time.  This patch splits the registration of the clocks in two
> phases:
>
>   1. Register all essential clocks which don't use RPM and are needed
>      during early boot.
>
>   2. Register at a later boot time the rest of clocks.
>
> This patch adds power management support for Tegra20 and Tegra30 clocks.
>
> Tested-by: Peter Geis <pgwipeout@...il.com> # Ouya T30
> Tested-by: Paul Fertser <fercerpav@...il.com> # PAZ00 T20
> Tested-by: Nicolas Chauvet <kwizart@...il.com> # PAZ00 T20 and TK1 T124
> Tested-by: Matt Merhar <mattmerhar@...tonmail.com> # Ouya T30
> Signed-off-by: Dmitry Osipenko <digetx@...il.com>
> ---
>  drivers/clk/tegra/Makefile      |   1 +
>  drivers/clk/tegra/clk-device.c  | 230 ++++++++++++++++++++++++++++++++
>  drivers/clk/tegra/clk-pll.c     |   2 +-
>  drivers/clk/tegra/clk-super.c   |   2 +-
>  drivers/clk/tegra/clk-tegra20.c |  77 ++++++++---
>  drivers/clk/tegra/clk-tegra30.c | 116 +++++++++++-----
>  drivers/clk/tegra/clk.c         |  75 ++++++++++-
>  drivers/clk/tegra/clk.h         |   2 +
>  8 files changed, 451 insertions(+), 54 deletions(-)
>  create mode 100644 drivers/clk/tegra/clk-device.c
>
> diff --git a/drivers/clk/tegra/Makefile b/drivers/clk/tegra/Makefile
> index 7b1816856eb5..a0715cdfc1a4 100644
> --- a/drivers/clk/tegra/Makefile
> +++ b/drivers/clk/tegra/Makefile
> @@ -1,6 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-y                                  += clk.o
>  obj-y                                  += clk-audio-sync.o
> +obj-y                                  += clk-device.o
>  obj-y                                  += clk-dfll.o
>  obj-y                                  += clk-divider.o
>  obj-y                                  += clk-periph.o
> diff --git a/drivers/clk/tegra/clk-device.c b/drivers/clk/tegra/clk-device.c
> new file mode 100644
> index 000000000000..830bc0ba25d3
> --- /dev/null
> +++ b/drivers/clk/tegra/clk-device.c
> @@ -0,0 +1,230 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/clk.h>
> +#include <linux/clk-provider.h>
> +#include <linux/mutex.h>
> +#include <linux/of_device.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_domain.h>
> +#include <linux/pm_opp.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/slab.h>
> +
> +#include <soc/tegra/common.h>
> +
> +#include "clk.h"
> +
> +/*
> + * This driver manages performance state of the core power domain for the
> + * independent PLLs and system clocks.  We created a virtual clock device
> + * for such clocks, see tegra_clk_dev_register().
> + */
> +
> +struct tegra_clk_device {
> +       struct notifier_block clk_nb;
> +       struct device *dev;
> +       struct clk_hw *hw;
> +       struct mutex lock;
> +};
> +
> +static int tegra_clock_set_pd_state(struct tegra_clk_device *clk_dev,
> +                                   unsigned long rate)
> +{
> +       struct device *dev = clk_dev->dev;
> +       struct dev_pm_opp *opp;
> +       unsigned int pstate;
> +
> +       opp = dev_pm_opp_find_freq_ceil(dev, &rate);
> +       if (opp == ERR_PTR(-ERANGE)) {
> +               dev_dbg(dev, "failed to find ceil OPP for %luHz\n", rate);
> +               opp = dev_pm_opp_find_freq_floor(dev, &rate);
> +       }
> +
> +       if (IS_ERR(opp)) {
> +               dev_err(dev, "failed to find OPP for %luHz: %pe\n", rate, opp);
> +               return PTR_ERR(opp);
> +       }
> +
> +       pstate = dev_pm_opp_get_required_pstate(opp, 0);
> +       dev_pm_opp_put(opp);
> +
> +       return dev_pm_genpd_set_performance_state(dev, pstate);

The above code certainly looks like it can be made generic through a
common opp helper. I know we have discussed this before, so I am not
saying you should change right now.

Let's instead see what I think (and Viresh), when I have reviewed the
entire series.

> +}
> +
> +static int tegra_clock_change_notify(struct notifier_block *nb,
> +                                    unsigned long msg, void *data)
> +{
> +       struct clk_notifier_data *cnd = data;
> +       struct tegra_clk_device *clk_dev;
> +       int err = 0;
> +
> +       clk_dev = container_of(nb, struct tegra_clk_device, clk_nb);
> +
> +       mutex_lock(&clk_dev->lock);
> +       switch (msg) {
> +       case PRE_RATE_CHANGE:
> +               if (cnd->new_rate > cnd->old_rate)
> +                       err = tegra_clock_set_pd_state(clk_dev, cnd->new_rate);
> +               break;
> +
> +       case ABORT_RATE_CHANGE:
> +               err = tegra_clock_set_pd_state(clk_dev, cnd->old_rate);
> +               break;
> +
> +       case POST_RATE_CHANGE:
> +               if (cnd->new_rate < cnd->old_rate)
> +                       err = tegra_clock_set_pd_state(clk_dev, cnd->new_rate);
> +               break;
> +
> +       default:
> +               break;
> +       }
> +       mutex_unlock(&clk_dev->lock);
> +
> +       return notifier_from_errno(err);
> +}
> +
> +static int tegra_clock_sync_pd_state(struct tegra_clk_device *clk_dev)
> +{
> +       unsigned long rate;
> +       int ret = 0;
> +
> +       mutex_lock(&clk_dev->lock);
> +
> +       if (!pm_runtime_status_suspended(clk_dev->dev)) {
> +               rate = clk_hw_get_rate(clk_dev->hw);
> +               ret = tegra_clock_set_pd_state(clk_dev, rate);

Don't we need to sync the performance state even when the device is
runtime suspended?

Perhaps the clock, via a child-clock for example, can get
prepared/enabled (hence its device gets runtime resumed) before there
is a clock rate update for it. Then there is no performance state set
for it, right? Or maybe that isn't a problem?

> +       }
> +
> +       mutex_unlock(&clk_dev->lock);
> +
> +       return ret;
> +}
> +
> +static int tegra_clock_probe(struct platform_device *pdev)
> +{
> +       struct tegra_core_opp_params opp_params = {};
> +       struct tegra_clk_device *clk_dev;
> +       struct device *dev = &pdev->dev;
> +       struct clk *clk;
> +       int err;
> +
> +       if (!dev->pm_domain)
> +               return -EINVAL;
> +
> +       clk_dev = devm_kzalloc(dev, sizeof(*clk_dev), GFP_KERNEL);
> +       if (!clk_dev)
> +               return -ENOMEM;
> +
> +       clk = devm_clk_get(dev, NULL);
> +       if (IS_ERR(clk))
> +               return PTR_ERR(clk);
> +
> +       clk_dev->dev = dev;
> +       clk_dev->hw = __clk_get_hw(clk);
> +       clk_dev->clk_nb.notifier_call = tegra_clock_change_notify;
> +       mutex_init(&clk_dev->lock);
> +
> +       platform_set_drvdata(pdev, clk_dev);
> +
> +       /*
> +        * Runtime PM was already enabled for this device by the parent clk
> +        * driver and power domain state should be synced under clk_dev lock,
> +        * hence we don't use the common OPP helper that initializes OPP
> +        * state. For some clocks common OPP helper may fail to find ceil
> +        * rate, it's handled by this driver.
> +        */
> +       err = devm_tegra_core_dev_init_opp_table(dev, &opp_params);
> +       if (err)
> +               return err;
> +
> +       err = clk_notifier_register(clk, &clk_dev->clk_nb);
> +       if (err) {
> +               dev_err(dev, "failed to register clk notifier: %d\n", err);
> +               return err;
> +       }
> +
> +       /*
> +        * The driver is attaching to a potentially active/resumed clock, hence
> +        * we need to sync the power domain performance state in a accordance to
> +        * the clock rate if clock is resumed.
> +        */
> +       err = tegra_clock_sync_pd_state(clk_dev);
> +       if (err)
> +               goto unreg_clk;
> +
> +       return 0;
> +
> +unreg_clk:
> +       clk_notifier_unregister(clk, &clk_dev->clk_nb);
> +
> +       return err;
> +}
> +
> +static __maybe_unused int tegra_clock_pm_suspend(struct device *dev)
> +{
> +       struct tegra_clk_device *clk_dev = dev_get_drvdata(dev);
> +
> +       /*
> +        * Power management of the clock is entangled with the Tegra PMC
> +        * GENPD because PMC driver enables/disables clocks for toggling
> +        * of the PD's on/off state.
> +        *
> +        * The PMC GENPD is resumed in NOIRQ phase, before RPM of the clocks
> +        * becomes available, hence PMC can't use clocks at the early resume
> +        * phase if RPM is involved. For example when 3d clock is enabled,
> +        * it may enable the parent PLL clock that needs to be RPM-resumed.
> +        *
> +        * Secondly, the PLL clocks may be enabled by the low level suspend
> +        * code, so we need to assume that PLL is in enabled state during
> +        * suspend.
> +        *
> +        * We will keep PLLs and system clock resumed during suspend time.
> +        * All PLLs on all SoCs are low power and system clock is always-on,
> +        * so practically not much is changed here.
> +        */
> +
> +       return clk_prepare(clk_dev->hw->clk);

I am trying to understand, more exactly, what you intend to achieve
with the clk_prepare() here. It looks a bit weird, to me. Can you try
to elaborate a bit more on the use case?

Is this rather about making sure that the clock's corresponding PM
domain stays powered on during system suspend? In that case, I think
there may be an alternative option....

> +}
> +
> +static __maybe_unused int tegra_clock_pm_resume(struct device *dev)
> +{
> +       struct tegra_clk_device *clk_dev = dev_get_drvdata(dev);
> +
> +       clk_unprepare(clk_dev->hw->clk);
> +
> +       return 0;
> +}
> +
> +static void tegra_clock_shutdown(struct platform_device *pdev)
> +{
> +       struct tegra_clk_device *clk_dev = platform_get_drvdata(pdev);
> +
> +       clk_prepare(clk_dev->hw->clk);
> +}
> +
> +static const struct dev_pm_ops tegra_clock_pm = {
> +       SET_SYSTEM_SLEEP_PM_OPS(tegra_clock_pm_suspend,
> +                               tegra_clock_pm_resume)
> +};

[...]

Kind regards
Uffe