[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAE-0n507hS0huoCVoarV65F5cGsxMuYCv-3C4s2e1m61cPMZcg@mail.gmail.com>
Date: Fri, 4 Nov 2022 13:51:23 -0700
From: Stephen Boyd <swboyd@...omium.org>
To: Bjorn Andersson <andersson@...nel.org>,
Douglas Anderson <dianders@...omium.org>
Cc: Taniya Das <quic_tdas@...cinc.com>,
Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
Judy Hsiao <judyhsiao@...omium.org>,
Srinivasa Rao Mandadapu <quic_srivasam@...cinc.com>,
Matthias Kaehlcke <mka@...omium.org>,
Andy Gross <agross@...nel.org>,
Konrad Dybcio <konrad.dybcio@...ainline.org>,
Michael Turquette <mturquette@...libre.com>,
Stephen Boyd <sboyd@...nel.org>,
Taniya Das <tdas@...eaurora.org>,
linux-arm-msm@...r.kernel.org, linux-clk@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] clk: qcom: lpass-sc7280: Fix pm_runtime usage
Quoting Douglas Anderson (2022-11-04 06:56:28)
> The pm_runtime usage in lpass-sc7280 was broken in quite a few
> ways. Specifically:
>
> 1. At the end of probe it called "put" twice. This is a no-no and will
> end us up with a negative usage count. Even worse than calling
> "put" twice, it never called "get" once. Thus after bootup it could
> be seen that the runtime usage of the devices managed by this
> driver was -2.
> 2. In some error cases it manually called pm_runtime_disable() even
> though it had previously used devm_add_action_or_reset() to set
> this up to be called automatically. This meant that in these error
> cases we'd double-call pm_runtime_disable().
> 3. It forgot to call undo pm_runtime_use_autosuspend(), which can
> sometimes have subtle problems (and the docs specifically mention
> that you need to undo this function).
>
> Overall the above seriously calls into question how this driver is
> working. It seems like a combination of "it doesn't", "by luck", and
> "because of the weirdness of runtime_pm". Specifically I put a
> printout to the serial console every time the runtime suspend/resume
> was called for the two devices created by this driver (I wrapped the
> pm_clk calls). When I had serial console enabled, I found that the
> calls got resumed at bootup (when the clk core probed and before our
> double-put) and then never touched again. That's no good.
> [ 0.829997] DOUG: my_pm_clk_resume, usage=1
> [ 0.835487] DOUG: my_pm_clk_resume, usage=1
>
> When I disabled serial console (speeding up boot), I got a different
> pattern, which I guess (?) is better:
> [ 0.089767] DOUG: my_pm_clk_resume, usage=1
> [ 0.090507] DOUG: my_pm_clk_resume, usage=1
> [ 0.151885] DOUG: my_pm_clk_suspend, usage=-2
> [ 0.151914] DOUG: my_pm_clk_suspend, usage=-2
> [ 1.825747] DOUG: my_pm_clk_resume, usage=-1
> [ 1.825774] DOUG: my_pm_clk_resume, usage=-1
> [ 1.888269] DOUG: my_pm_clk_suspend, usage=-2
> [ 1.888282] DOUG: my_pm_clk_suspend, usage=-2
>
> These different patterns have to do with the fact that the core PM
> Runtime code really isn't designed to be robust to negative usage
> counts and sometimes may happen to stumble upon a behavior that
> happens to "work". For instance, you can see that
> __pm_runtime_suspend() will treat any non-zero value (including
> negative numbers) as if the device is in use.
>
> In any case, let's fix the driver to be correct. We'll hold a
> pm_runtime reference for the whole probe and then drop it (once!) at
> the end. We'll get rid of manual pm_runtime_disable() calls in the
> error handling. We'll also switch to devm_pm_runtime_enable(), which
> magically handles undoing pm_runtime_use_autosuspend() as of commit
> b4060db9251f ("PM: runtime: Have devm_pm_runtime_enable() handle
> pm_runtime_dont_use_autosuspend()").
>
> While we're at this, let's also use devm_pm_clk_create() instead of
> rolling it ourselves.
>
> Note that the above changes make it obvious that
> lpassaudio_create_pm_clks() was doing more than just creating
> clocks. It was also setting up pm_runtime parameters. Let's rename it.
>
> All of these problems were found by code inspection. I started looking
> at this driver because it was involved in a deadlock that I reported a
> while ago [1]. Though I bisected the deadlock to commit 1b771839de05
> ("clk: qcom: gdsc: enable optional power domain support"), it was
> never really clear why that patch affected it other than a luck of
> timing changes. I'll also note that by fixing the timing (as done in
> this change) we also seem to aboid the deadlock, which is a nice
> benefit.
>
> Also note that some of the fixes here are much the same type of stuff
> that Dmitry did in commit 72cfc73f4663 ("clk: qcom: use
> devm_pm_runtime_enable and devm_pm_clk_create"), but I guess
> lpassaudiocc-sc7280.c didn't exist then.
>
> [1] https://lore.kernel.org/r/20220922154354.2486595-1-dianders@chromium.org
>
> Fixes: a9dd26639d05 ("clk: qcom: lpass: Add support for LPASS clock controller for SC7280")
> Signed-off-by: Douglas Anderson <dianders@...omium.org>
> ---
Reviewed-by: Stephen Boyd <swboyd@...omium.org>
Powered by blists - more mailing lists