linux-kernel - Re: [PATCH v8 01/34] opp: Add dev_pm_opp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFoQdn1rm91iFNJwZwpSYcKJBjDLqtJB4KZAkhgY1Grm-Q@mail.gmail.com>
Date:   Thu, 19 Aug 2021 15:07:09 +0200
From:   Ulf Hansson <ulf.hansson@...aro.org>
To:     Dmitry Osipenko <digetx@...il.com>,
        Viresh Kumar <viresh.kumar@...aro.org>
Cc:     Thierry Reding <thierry.reding@...il.com>,
        Jonathan Hunter <jonathanh@...dia.com>,
        Viresh Kumar <vireshk@...nel.org>,
        Stephen Boyd <sboyd@...nel.org>,
        Peter De Schrijver <pdeschrijver@...dia.com>,
        Mikko Perttunen <mperttunen@...dia.com>,
        Peter Chen <peter.chen@...nel.org>,
        Mark Brown <broonie@...nel.org>,
        Lee Jones <lee.jones@...aro.org>,
        Uwe Kleine-König 
        <u.kleine-koenig@...gutronix.de>, Nishanth Menon <nm@...com>,
        Vignesh Raghavendra <vigneshr@...com>,
        Richard Weinberger <richard@....at>,
        Miquel Raynal <miquel.raynal@...tlin.com>,
        Lucas Stach <dev@...xeye.de>, Stefan Agner <stefan@...er.ch>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Rob Herring <robh+dt@...nel.org>,
        Michael Turquette <mturquette@...libre.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-tegra <linux-tegra@...r.kernel.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux USB List <linux-usb@...r.kernel.org>,
        linux-staging@...ts.linux.dev, linux-spi@...r.kernel.org,
        linux-pwm@...r.kernel.org, linux-mtd@...ts.infradead.org,
        linux-mmc <linux-mmc@...r.kernel.org>,
        Linux Media Mailing List <linux-media@...r.kernel.org>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        DTML <devicetree@...r.kernel.org>,
        linux-clk <linux-clk@...r.kernel.org>
Subject: Re: [PATCH v8 01/34] opp: Add dev_pm_opp_sync() helper

On Wed, 18 Aug 2021 at 17:43, Dmitry Osipenko <digetx@...il.com> wrote:
>
> 18.08.2021 13:08, Ulf Hansson пишет:
> > On Wed, 18 Aug 2021 at 11:50, Viresh Kumar <viresh.kumar@...aro.org> wrote:
> >>
> >> On 18-08-21, 11:41, Ulf Hansson wrote:
> >>> On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@...aro.org> wrote:
> >>>> What we need here is just configure. So something like this then:
> >>>>
> >>>> - genpd->get_performance_state()
> >>>>   -> dev_pm_opp_get_current_opp() //New API
> >>>>   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
> >>>>
> >>>> This can be done just once from probe() then.
> >>>
> >>> How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?
> >>
> >> The opp core already has a way of finding current OPP, that's what
> >> Dmitry is trying to use here. It finds it using clk_get_rate(), if
> >> that is zero, it picks the lowest freq possible.
> >>
> >>> I am sure I understand the problem. When a device is getting probed,
> >>> it needs to consume power, how else can the corresponding driver
> >>> successfully probe it?
> >>
> >> Dmitry can answer that better, but a device doesn't necessarily need
> >> to consume energy in probe. It can consume bus clock, like APB we
> >> have, but the more energy consuming stuff can be left disabled until
> >> the time a user comes up. Probe will just end up registering the
> >> driver and initializing it.
> >
> > That's perfectly fine, as then it's likely that it won't vote for an
> > OPP, but can postpone that as well.
> >
> > Perhaps the problem is rather that the HW may already carry a non-zero
> > vote made from a bootloader. If the consumer driver tries to clear
> > that vote (calling dev_pm_opp_set_rate(dev, 0), for example), it would
> > still not lead to any updates of the performance state in genpd,
> > because genpd internally has initialized the performance-state to
> > zero.
>
> We don't need to discover internal SoC devices because we use
> device-tree on ARM. For most devices power isn't required at a probe
> time because probe function doesn't touch h/w at all, thus devices are
> left in suspended state after probe.
>
> We have three components comprising PM on Tegra:
>
> 1. Power gate
> 2. Clock state
> 3. Voltage state
>
> GENPD on/off represents the 'power gate'.
>
> Clock and reset are controlled by device drivers using clk and rst APIs.
>
> Voltage state is represented by GENPD's performance level.
>
> GENPD core assumes that at a first rpm-resume of a consumer device, its
> genpd_performance=0. Not true for Tegra because h/w of the device is
> preconfigured to a non-zero perf level initially, h/w may not support
> zero level at all.

I think you may be misunderstanding genpd's behaviour around this, but
let me elaborate.

In genpd_runtime_resume(), we try to restore the performance state for
the device that genpd_runtime_suspend() *may* have dropped earlier.
That means, if genpd_runtime_resume() is called prior
genpd_runtime_suspend() for the first time, it means that
genpd_runtime_resume() will *not* restore a performance state, but
instead just leave the performance state as is for the device (see
genpd_restore_performance_state()).

In other words, a consumer driver may use the following sequence to
set an initial performance state for the device during ->probe():

...
rate = clk_get_rate()
dev_pm_opp_set_rate(rate)

pm_runtime_enable()
pm_runtime_resume_and_get()
...

Note that, it's the consumer driver's responsibility to manage device
specific resources, in its ->runtime_suspend|resume() callbacks.
Typically that means dealing with clock gating/ungating, for example.

In the other scenario where a consumer driver prefers to *not* call
pm_runtime_resume_and_get() in its ->probe(), because it doesn't need
to power on the device to complete probing, then we don't want to vote
for an OPP at all - and we also want the performance state for the
device in genpd to be set to zero. Correct?

Is this the main problem you are trying to solve, because I think this
doesn't work out of the box as of today?

There is another concern though, but perhaps it's not a problem after
all. Viresh told us that dev_pm_opp_set_rate() may turn on resources
like clock/regulators. That could certainly be problematic, in
particular if the device and its genpd have OPP tables associated with
it and the consumer driver wants to follow the above sequence in
probe.

Viresh, can you please chime in here and elaborate on some of the
magic happening behind dev_pm_opp_set_rate() API - is there a problem
here or not?

>
> GENPD core assumes that consumer devices can work at any performance
> level. Not true for Tegra because voltage needs to be set in accordance
> to the clock rate before clock is enabled, otherwise h/w won't work
> properly, perhaps clock may be unstable or h/w won't be latching.

Correct. Genpd relies on the callers to use the OPP framework if there
are constraints like you describe above.

That said, it's not forbidden for a consumer driver to call
dev_pm_genpd_set_performance_state() directly, but then it better
knows exactly what it's doing.

>
> Performance level should be set to 0 while device is suspended.

Do you mean system suspend or runtime suspend? Or both?

> Performance level needs to be bumped on rpm-resume of a device in
> accordance to h/w state before hardware is enabled.

Assuming there was a performance state set for the device when
genpd_runtime_suspend() was called, genpd_runtime_resume() will
restore that state according to the sequence you described.

Kind regards
Uffe