[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YMuGCjg2H3eHVe8K@yoga>
Date: Thu, 17 Jun 2021 12:27:38 -0500
From: Bjorn Andersson <bjorn.andersson@...aro.org>
To: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
Cc: Mike Tipton <mdtipton@...eaurora.org>,
Ulf Hansson <ulf.hansson@...aro.org>,
Mark Brown <broonie@...nel.org>,
Stephen Boyd <sboyd@...nel.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Kevin Hilman <khilman@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Linux PM <linux-pm@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Rajendra Nayak <rnayak@...eaurora.org>,
Rob Herring <robh+dt@...nel.org>
Subject: Re: [PATCH 2/2] PM: domain: use per-genpd lockdep class
On Thu 17 Jun 11:19 CDT 2021, Dmitry Baryshkov wrote:
> On 17/06/2021 12:07, Ulf Hansson wrote:
> > + Rajendra
> >
> > On Tue, 15 Jun 2021 at 17:55, Bjorn Andersson
> > <bjorn.andersson@...aro.org> wrote:
> > >
> > > On Tue 15 Jun 05:17 CDT 2021, Ulf Hansson wrote:
> > >
> > > > + Mark
> > > >
> > > > On Fri, 11 Jun 2021 at 16:34, Dmitry Baryshkov
> > > > <dmitry.baryshkov@...aro.org> wrote:
> > > > >
> > > > > Added Stephen to Cc list
> > > > >
> > > > > On Fri, 11 Jun 2021 at 16:50, Ulf Hansson <ulf.hansson@...aro.org> wrote:
> > > > > >
> > > > > > On Fri, 11 Jun 2021 at 12:15, Dmitry Baryshkov
> > > > > > <dmitry.baryshkov@...aro.org> wrote:
> > > > > > >
> > > > > > > In case of nested genpds it is easy to get the following warning from
> > > > > > > lockdep, because all genpd's mutexes share same locking class. Use the
> > > > > > > per-genpd locking class to stop lockdep from warning about possible
> > > > > > > deadlocks. It is not possible to directly use genpd nested locking, as
> > > > > > > it is not the genpd code calling genpd. There are interim calls to
> > > > > > > regulator core.
> > > > > > >
> > > > > > > [ 3.030219] ============================================
> > > > > > > [ 3.030220] WARNING: possible recursive locking detected
> > > > > > > [ 3.030221] 5.13.0-rc3-00054-gf8f0a2f2b643-dirty #2480 Not tainted
> > > > > > > [ 3.030222] --------------------------------------------
> > > > > > > [ 3.030223] kworker/u16:0/7 is trying to acquire lock:
> > > > > > > [ 3.030224] ffffde0eabd29aa0 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x2c
> > > > > > > [ 3.030236]
> > > > > > > [ 3.030236] but task is already holding lock:
> > > > > > > [ 3.030236] ffffde0eabcfd6d0 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x2c
> > > > > > > [ 3.030240]
> > > > > > > [ 3.030240] other info that might help us debug this:
> > > > > > > [ 3.030240] Possible unsafe locking scenario:
> > > > > > > [ 3.030240]
> > > > > > > [ 3.030241] CPU0
> > > > > > > [ 3.030241] ----
> > > > > > > [ 3.030242] lock(&genpd->mlock);
> > > > > > > [ 3.030243] lock(&genpd->mlock);
> > > > > > > [ 3.030244]
> > > > > > > [ 3.030244] *** DEADLOCK ***
> > > > > > > [ 3.030244]
> > > > > > > [ 3.030244] May be due to missing lock nesting notation
> > > > > > > [ 3.030244]
> > > > > > > [ 3.030245] 6 locks held by kworker/u16:0/7:
> > > > > > > [ 3.030246] #0: ffff6cca00010938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1f0/0x730
> > > > > > > [ 3.030252] #1: ffff8000100c3db0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1f0/0x730
> > > > > > > [ 3.030255] #2: ffff6cca00ce3188 (&dev->mutex){....}-{3:3}, at: __device_attach+0x3c/0x184
> > > > > > > [ 3.030260] #3: ffffde0eabcfd6d0 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x2c
> > > > > > > [ 3.030264] #4: ffff8000100c3968 (regulator_ww_class_acquire){+.+.}-{0:0}, at: regulator_lock_dependent+0x6c/0x1b0
> > > > > > > [ 3.030270] #5: ffff6cca00a59158 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_lock_recursive+0x94/0x1d0
> > > > > > > [ 3.030273]
> > > > > > > [ 3.030273] stack backtrace:
> > > > > > > [ 3.030275] CPU: 6 PID: 7 Comm: kworker/u16:0 Not tainted 5.13.0-rc3-00054-gf8f0a2f2b643-dirty #2480
> > > > > > > [ 3.030276] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
> > > > > > > [ 3.030278] Workqueue: events_unbound deferred_probe_work_func
> > > > > > > [ 3.030280] Call trace:
> > > > > > > [ 3.030281] dump_backtrace+0x0/0x1a0
> > > > > > > [ 3.030284] show_stack+0x18/0x24
> > > > > > > [ 3.030286] dump_stack+0x108/0x188
> > > > > > > [ 3.030289] __lock_acquire+0xa20/0x1e0c
> > > > > > > [ 3.030292] lock_acquire.part.0+0xc8/0x320
> > > > > > > [ 3.030294] lock_acquire+0x68/0x84
> > > > > > > [ 3.030296] __mutex_lock+0xa0/0x4f0
> > > > > > > [ 3.030299] mutex_lock_nested+0x40/0x50
> > > > > > > [ 3.030301] genpd_lock_mtx+0x18/0x2c
> > > > > > > [ 3.030303] dev_pm_genpd_set_performance_state+0x94/0x1a0
> > > > > > > [ 3.030305] reg_domain_enable+0x28/0x4c
> > > > > > > [ 3.030308] _regulator_do_enable+0x420/0x6b0
> > > > > > > [ 3.030310] _regulator_enable+0x178/0x1f0
> > > > > > > [ 3.030312] regulator_enable+0x3c/0x80
> > > > > >
> > > > > > At a closer look, I am pretty sure that it's the wrong code design
> > > > > > that triggers this problem, rather than that we have a real problem in
> > > > > > genpd. To put it simply, the code in genpd isn't designed to work like
> > > > > > this. We will end up in circular looking paths, leading to deadlocks,
> > > > > > sooner or later if we allow the above code path.
> > > > > >
> > > > > > To fix it, the regulator here needs to be converted to a proper PM
> > > > > > domain. This PM domain should be assigned as the parent to the one
> > > > > > that is requested to be powered on.
> > > > >
> > > > > This more or less resembles original design, replaced per review
> > > > > request to use separate regulator
> > > > > (https://lore.kernel.org/linux-arm-msm/160269659638.884498.4031967462806977493@swboyd.mtv.corp.google.com/,
> > > > > https://lore.kernel.org/linux-arm-msm/20201023131925.334864-1-dmitry.baryshkov@linaro.org/).
> > > >
> > > > Thanks for the pointers. In hindsight, it looks like the
> > > > "regulator-fixed-domain" DT binding wasn't the right thing.
> > > >
> > > > Fortunately, it looks like the problem can be quite easily fixed, by
> > > > moving to a correct model of the domain hierarchy.
> > > >
> > >
> > > Can you give some pointers to how we actually fix this?
> > >
> > > The problem that lead us down this path is that drivers/clk/qcom/gdsc.c
> > > describes power domains, which are parented by domains provided by
> > > drivers/soc/qcom/rpmhpd.c.
> > >
> > > But I am unable to find a way for the gdsc driver to get hold of the
> > > struct generic_pm_domain of the resources exposed by the rpmhpd driver.
> >
> > You don't need a handle to the struct generic_pm_domain, to assign a
> > parent/child domain. Instead you can use of_genpd_add_subdomain(),
> > which takes two "struct of_phandle_args*" corresponding to the
> > parent/child device nodes of the genpd providers and then let genpd
> > internally do the look up.
> >
> > As an example, you may have a look at how the PM domain topology in
> > drivers/cpuidle/cpuidle-psci-domain.c are being created.
> >
> > >
> > >
> > > The second thing that Dmitry's regulator driver does is to cast the
> > > appropriate performance state vote on the rpmhpd resource, but I _think_
> > > we can do that using OPP tables in the gdsc client's node...
> >
> > Yes, it looks like using an OPP table and to specify a
> > "required-opps", at some device node is the right thing to do.
> >
> > In this case, I wonder if the "required-opps" belongs to the genpd
> > provider node of the new power-domain (as it seems like it only
> > supports one fixed performance state when it's powered on). On the
> > other hand, specifying this at the consumer node should work as well,
> > I think.
> >
> > Actually, this relates to the changes [1] that Rajendra is working on
> > with "assigned-performance-state" (that we agreed to move to
> > OPP/required-opps) for genpd.
>
> What about the following dts snippet?
> I do not want to add power-domains directly to the dispcc node (as it's not
> a device's power domain, just gdsc's parent power domain).
>
>
> dispcc: clock-controller@...0000 {
> compatible = "qcom,sm8250-dispcc";
> [....]
> #power-domain-cells = <1>;
>
> mmss_gdsc {
> power-domains = <&rpmhpd SM8250_MMCX>;
> required-opps = <&rpmhpd_opp_low_svs>;
> };
> };
According to the documentation dispcc actually sits in MMCX (I thought
it sat in CX...). So it seems appropriate to just specify that as the
one and only power-domain for &dispcc and use that as the parent for
MDSS_GDSC.
That said, I do think we have other GDSCs in the system where the
controller sits in one power-domain and the parent power-domain is a
different one. I presume the right path here is to list all the
power-domains in DT and then use some name based matching?
Regards,
Bjorn
Powered by blists - more mailing lists