lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFoJ3pLU-5_b5MSxMZd7B1cfOvmcdqR4FGkU2Wb7No0mcw@mail.gmail.com>
Date: Wed, 15 Jan 2025 16:29:15 +0100
From: Ulf Hansson <ulf.hansson@...aro.org>
To: Claudiu Beznea <claudiu.beznea@...on.dev>
Cc: Jonathan Cameron <jic23@...nel.org>, prabhakar.mahadev-lad.rj@...renesas.com, 
	lars@...afoo.de, linux-iio@...r.kernel.org, linux-renesas-soc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, 
	Claudiu Beznea <claudiu.beznea.uj@...renesas.com>, "Rafael J. Wysocki" <rafael@...nel.org>, 
	linux-pm@...r.kernel.org
Subject: Re: [PATCH 1/2] iio: adc: rzg2l_adc: Drop devm_pm_runtime_enable()

On Wed, 15 Jan 2025 at 14:37, Claudiu Beznea <claudiu.beznea@...on.dev> wrote:
>
> Hi, Jonathan,
>
> Thank you for your input!
>
> On 11.01.2025 15:14, Jonathan Cameron wrote:
> > On Mon, 6 Jan 2025 11:18:41 +0200
> > Claudiu Beznea <claudiu.beznea@...on.dev> wrote:
> >
> >> Hi, Jonathan,
> >>
> >>
> >> On 04.01.2025 15:52, Jonathan Cameron wrote:
> >>> On Fri,  3 Jan 2025 16:00:41 +0200
> >>> Claudiu <claudiu.beznea@...on.dev> wrote:
> >>>
> >>>> From: Claudiu Beznea <claudiu.beznea.uj@...renesas.com>
> >>> +CC Rafael and linux-pm
> >>>
> >>>>
> >>>> On all systems where the rzg2l_adc driver is used, the ADC clocks are part
> >>>> of a PM domain. The code that implements the PM domains support is in
> >>>> drivers/clk/renesas/rzg2l-cpg.c, the functions of interest for this commit
> >>>> being rzg2l_cpg_attach_dev() and rzg2l_cpg_deattach_dev(). The PM
> >>>> domains support is registered with GENPD_FLAG_PM_CLK which, according to
> >>>> the documentation, instructs genpd to use the PM clk framework while
> >>>> powering on/off attached devices.
> >>>>
> >>>> During probe, the ADC device is attached to the PM domain
> >>>> controlling the ADC clocks. Similarly, during removal, the ADC device is
> >>>> detached from the PM domain.
> >>>>
> >>>> The detachment call stack is as follows:
> >>>>
> >>>> device_driver_detach() ->
> >>>>   device_release_driver_internal() ->
> >>>>     __device_release_driver() ->
> >>>>       device_remove() ->
> >>>>         platform_remove() ->
> >>>>           dev_pm_domain_detach()
> >>>>
> >>>> During driver unbind, after the ADC device is detached from its PM domain,
> >>>> the device_unbind_cleanup() function is called, which subsequently invokes
> >>>> devres_release_all(). This function handles devres resource cleanup.
> >>>>
> >>>> If runtime PM is enabled via devm_pm_runtime_enable(), the cleanup process
> >>>> triggers the action or reset function for disabling runtime PM. This
> >>>> function is pm_runtime_disable_action(), which leads to the following call
> >>>> stack of interest when called:
> >>>>
> >>>> pm_runtime_disable_action() ->
> >>>>   pm_runtime_dont_use_autosuspend() ->
> >>>
> >>> So is the only real difference that in the code below you disable runtime pm
> >>> before autosuspend?
> >>
> >> No, the difference is that now, the driver specific runtime PM APIs are not
> >> called anymore (through the pointed call stack) after the ADC was removed
> >> from it's PM domain.
> >
> > By my reading they are only not called now because you turn autosuspend off
> > after disabling runtime PM.
>
> Sorry, I wanted to say that the runtime PM APIs are not called anymore from
> devm_action_release(), though this call stack:
>
> [   24.801195] Call trace:
> [   24.803633]  rzg2l_adc_pm_runtime_suspend+0x18/0x54 (P)
> [   24.808847]  pm_generic_runtime_suspend+0x2c/0x44 (L)
> [   24.813887]  pm_generic_runtime_suspend+0x2c/0x44
> [   24.818580]  __rpm_callback+0x48/0x198
> [   24.822319]  rpm_callback+0x68/0x74
> [   24.825798]  rpm_suspend+0x100/0x578
> [   24.829362]  rpm_idle+0xd0/0x17c
> [   24.832582]  update_autosuspend+0x30/0xc4
> [   24.836580]  pm_runtime_disable_action+0x40/0x64
> [   24.841184]  devm_action_release+0x14/0x20
> [   24.845274]  devres_release_all+0xa0/0x100
> [   24.849361]  device_unbind_cleanup+0x18/0x60
>
> This is because I dropped the devm_pm_runtime_enable() which registers the
> pm_runtime_disable_action(), which is called at the time the
> device_unbind_cleanup() is called, which is called when the ADC is not
> anymore part of its PM domain.
>
> If I change the order in remove function proposed in this patch, thus do:
>
> +static void rzg2l_adc_remove(struct platform_device *pdev)
> +{
> +       struct device *dev = &pdev->dev;
> +
> +       pm_runtime_dont_use_autosuspend(dev);
> +       pm_runtime_disable(dev);
>  }
>
> nothing changes with the behavior of this patch. There will be no issue if
> the device is runtime suspended/resumed through the
> pm_runtime_dont_use_autosuspend() because at the time the
> rzg2l_adc_remove() is called the ADC is still part of the PM domain.
>
>
>
> >
> >>
> >>
> >>>  Can you still do that with a devm callback just not
> >>> the standard one?
> >>
> >> No. It doesn't matter if we call the standard devm callback or driver
> >> specific one. As long as it is devm it will impact us as long as the driver
> >> specific runtime PM APIs are called through devres_release_all() after
> >> dev_pm_domain_detach(). And at that time the PM domain may be off along
> >> with its clocks disabled.
> >
> > As above, I think that this is only the case because of the reordering
> > of those two calls, not something more fundamental.
>
> I tried having a local devm function (the following diff applied with this
> patch reverted) identical with pm_runtime_disable_action():
>
> diff --git a/drivers/iio/adc/rzg2l_adc.c b/drivers/iio/adc/rzg2l_adc.c
> index 22a581c894f8..459cc9c67eec 100644
> --- a/drivers/iio/adc/rzg2l_adc.c
> +++ b/drivers/iio/adc/rzg2l_adc.c
> @@ -423,6 +423,12 @@ static int rzg2l_adc_hw_init(struct device *dev,
> struct rzg2l_adc *adc)
>         return ret;
>  }
>
> +static void rzg2l_pm_runtime_disable(void *data)
> +{
> +       pm_runtime_dont_use_autosuspend(data);
> +       pm_runtime_disable(data);
> +}
> +
>  static int rzg2l_adc_probe(struct platform_device *pdev)
>  {
>         struct device *dev = &pdev->dev;
> @@ -463,7 +469,9 @@ static int rzg2l_adc_probe(struct platform_device *pdev)
>
>         pm_runtime_set_autosuspend_delay(dev, 300);
>         pm_runtime_use_autosuspend(dev);
> -       ret = devm_pm_runtime_enable(dev);
> +       pm_runtime_enable(dev);
> +
> +       ret = devm_add_action_or_reset(dev, rzg2l_pm_runtime_disable, dev);
>         if (ret)
>                 return ret;
>
> With this the issue is still reproducible.
>
> However, changing the order of functions in rzg2l_pm_runtime_disable() and
> having it like:
>
> +static void rzg2l_pm_runtime_disable(void *data)
> +{
> +       pm_runtime_disable(data);
> +       pm_runtime_dont_use_autosuspend(data);
> +}
>
>
> leads to no failure when doing unbind/bind.
>
> However, I see the pm_runtime_disable() can still call rpm_resume() under
> certain conditions. It can still lead to failures if it is called after the
> device was remove from its PM domain.
>
> >
> > In driver remove flow, device_unbind_cleanup9() is called
> > just after device_remove() which is calling the dev->driver_remove()
> > callback. There are no runtime pm related calls in between that I can see.
>
> On my side the device_remove() is calling dev->bus->remove() which is
> platform_remove(), which calls the dev_pm_domain_detach(). The
> dev_pm_domain_detach() detaches the ADC from it's PM domain. Because of
> this, accessing now the ADC registers after a runtime resume leads to
> failures pointed in this patch (as of my investigation) (as the ADC is not
> anymore part of its PM domain and its PM domain is not started anymore
> though runtime PM APIs).
>
> A similar issue was found while I was adding thermal support for RZ/G3S,
> explained in
> https://lore.kernel.org/all/20250103163805.1775705-3-claudiu.beznea.uj@bp.renesas.com
>
>
> Jonathan, Rafael, Ulf, all,
>
> Do consider OK to change the order in pm_runtime_disable_action() to get
> rid of these issues, e.g.:
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 2ee45841486b..f27d311d2619 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -1547,8 +1547,8 @@ EXPORT_SYMBOL_GPL(pm_runtime_enable);
>
>  static void pm_runtime_disable_action(void *data)
>  {
> -       pm_runtime_dont_use_autosuspend(data);
>         pm_runtime_disable(data);
> +       pm_runtime_dont_use_autosuspend(data);
>  }
>
> though I see a rpm_resume() call is still possible though pm_runtime_disable().

I am still worried about keeping the device runtime enabled during a
window when we have turned off all resources for the device. Typically
we want to leave the device in a low power state after unbind.

That said, I would rather just drop the devm_pm_runtime_enable() API
altogether and convert all users of it into
pm_runtime_enable|disable(), similar to what your patch does.

>
> Thank you,
> Claudiu

[...]

Kind regards
Uffe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ