[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFoMqmCFBoO8FwQe2wHh2kqQi4jUZNFyiNckK7QhGVgmvg@mail.gmail.com>
Date: Fri, 9 May 2025 15:07:42 +0200
From: Ulf Hansson <ulf.hansson@...aro.org>
To: Claudiu Beznea <claudiu.beznea@...on.dev>
Cc: Jonathan Cameron <jic23@...nel.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
Daniel Lezcano <daniel.lezcano@...aro.org>, dakr@...nel.org, linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org, linux-renesas-soc@...r.kernel.org,
geert@...ux-m68k.org, Claudiu Beznea <claudiu.beznea.uj@...renesas.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-iio@...r.kernel.org,
Dmitry Torokhov <dmitry.torokhov@...il.com>
Subject: Re: [PATCH] driver core: platform: Use devres group to free driver
probe resources
On Fri, 9 May 2025 at 13:51, Claudiu Beznea <claudiu.beznea@...on.dev> wrote:
>
> Hi, Rafael, Ulf, PM list,
>
>
> On 09.04.2025 19:12, Claudiu Beznea wrote:
> > Hi, Rafael,
> >
> > On 30.03.2025 18:31, Jonathan Cameron wrote:
> >> On Thu, 27 Mar 2025 18:47:53 +0200
> >> Claudiu Beznea <claudiu.beznea@...on.dev> wrote:
> >>
> >>> Hi, Rafael,
> >>>
> >>> On 06.03.2025 08:11, Dmitry Torokhov wrote:
> >>>> On Wed, Mar 05, 2025 at 02:03:09PM +0000, Jonathan Cameron wrote:
> >>>>> On Wed, 19 Feb 2025 14:45:07 +0200
> >>>>> Claudiu Beznea <claudiu.beznea@...on.dev> wrote:
> >>>>>
> >>>>>> Hi, Daniel, Jonathan,
> >>>>>>
> >>>>>> On 15.02.2025 15:51, Claudiu Beznea wrote:
> >>>>>>> Hi, Greg,
> >>>>>>>
> >>>>>>> On 15.02.2025 15:25, Greg KH wrote:
> >>>>>>>> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote:
> >>>>>>>>> From: Claudiu Beznea <claudiu.beznea.uj@...renesas.com>
> >>>>>>>>>
> >>>>>>>>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
> >>>>>>>>> clocks are managed through PM domains. These PM domains, registered on
> >>>>>>>>> behalf of the clock controller driver, are configured with
> >>>>>>>>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
> >>>>>>>>> clocks are enabled/disabled using runtime PM APIs. The power domains may
> >>>>>>>>> also have power_on/power_off support implemented. After the device PM
> >>>>>>>>> domain is powered off any CPU accesses to these domains leads to system
> >>>>>>>>> aborts.
> >>>>>>>>>
> >>>>>>>>> During probe, devices are attached to the PM domain controlling their
> >>>>>>>>> clocks and power. Similarly, during removal, devices are detached from the
> >>>>>>>>> PM domain.
> >>>>>>>>>
> >>>>>>>>> The detachment call stack is as follows:
> >>>>>>>>>
> >>>>>>>>> device_driver_detach() ->
> >>>>>>>>> device_release_driver_internal() ->
> >>>>>>>>> __device_release_driver() ->
> >>>>>>>>> device_remove() ->
> >>>>>>>>> platform_remove() ->
> >>>>>>>>> dev_pm_domain_detach()
> >>>>>>>>>
> >>>>>>>>> During driver unbind, after the device is detached from its PM domain,
> >>>>>>>>> the device_unbind_cleanup() function is called, which subsequently invokes
> >>>>>>>>> devres_release_all(). This function handles devres resource cleanup.
> >>>>>>>>>
> >>>>>>>>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the
> >>>>>>>>> cleanup process triggers the action or reset function for disabling runtime
> >>>>>>>>> PM. This function is pm_runtime_disable_action(), which leads to the
> >>>>>>>>> following call stack of interest when called:
> >>>>>>>>>
> >>>>>>>>> pm_runtime_disable_action() ->
> >>>>>>>>> pm_runtime_dont_use_autosuspend() ->
> >>>>>>>>> __pm_runtime_use_autosuspend() ->
> >>>>>>>>> update_autosuspend() ->
> >>>>>>>>> rpm_idle()
> >>>>>>>>>
> >>>>>>>>> The rpm_idle() function attempts to resume the device at runtime. However,
> >>>>>>>>> at the point it is called, the device is no longer part of a PM domain
> >>>>>>>>> (which manages clocks and power states). If the driver implements its own
> >>>>>>>>> runtime PM APIs for specific functionalities - such as the rzg2l_adc
> >>>>>>>>> driver - while also relying on the power domain subsystem for power
> >>>>>>>>> management, rpm_idle() will invoke the driver's runtime PM API. However,
> >>>>>>>>> since the device is no longer part of a PM domain at this point, the PM
> >>>>>>>>> domain's runtime PM APIs will not be called. This leads to system aborts on
> >>>>>>>>> Renesas SoCs.
> >>>>>>>>>
> >>>>>>>>> Another identified case is when a subsystem performs various cleanups
> >>>>>>>>> using device_unbind_cleanup(), calling driver-specific APIs in the process.
> >>>>>>>>> A known example is the thermal subsystem, which may call driver-specific
> >>>>>>>>> APIs to disable the thermal device. The relevant call stack in this case
> >>>>>>>>> is:
> >>>>>>>>>
> >>>>>>>>> device_driver_detach() ->
> >>>>>>>>> device_release_driver_internal() ->
> >>>>>>>>> device_unbind_cleanup() ->
> >>>>>>>>> devres_release_all() ->
> >>>>>>>>> devm_thermal_of_zone_release() ->
> >>>>>>>>> thermal_zone_device_disable() ->
> >>>>>>>>> thermal_zone_device_set_mode() ->
> >>>>>>>>> struct thermal_zone_device_ops::change_mode()
> >>>>>>>>>
> >>>>>>>>> At the moment the driver-specific change_mode() API is called, the device
> >>>>>>>>> is no longer part of its PM domain. Accessing its registers without proper
> >>>>>>>>> power management leads to system aborts.
> >>>>>>>>>
> >>>>>>>>> Open a devres group before calling the driver probe, and close it
> >>>>>>>>> immediately after the driver remove function is called and before
> >>>>>>>>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or
> >>>>>>>>> reset functions are executed immediately after the driver remove function
> >>>>>>>>> completes. Additionally, it prevents driver-specific runtime PM APIs from
> >>>>>>>>> being called when the device is no longer part of its power domain.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@...renesas.com>
> >>>>>>>>> ---
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>
> >>>>> Hi Claudiu, Greg,
> >>>>>
> >>>>> Sorry, I missed this thread whilst travelling and only saw it because
> >>>>> of reference from the in driver solution.
> >>>>>
> >>>>>>>>>
> >>>>>>>>> Although Ulf gave its green light for the approaches on both IIO [1],
> >>>>>>>>> [2] and thermal subsystems [3], Jonathan considered unacceptable the
> >>>>>>>>> approaches in [1], [2] as he considered it may lead to dificult to
> >>>>>>>>> maintain code and code opened to subtle bugs (due to the potential of
> >>>>>>>>> mixing devres and non-devres calls). He pointed out a similar approach
> >>>>>>>>> that was done for the I2C bus [4], [5].
> >>>>>>>>>
> >>>>>>>>> As the discussions in [1], [2] stopped w/o a clear conclusion, this
> >>>>>>>>> patch tries to revive it by proposing a similar approach that was done
> >>>>>>>>> for the I2C bus.
> >>>>>>>>>
> >>>>>>>>> Please let me know you input.
> >>>>>>>>
> >>>>>>>> I'm with Jonathan here, the devres stuff is getting crazy here and you
> >>>>>>>> have drivers mixing them and side affects happening and lots of
> >>>>>>>> confusion. Your change here is only going to make it even more
> >>>>>>>> confusing, and shouldn't actually solve it for other busses (i.e. what
> >>>>>>>> about iio devices NOT on the platform bus?)
> >>>>>
> >>>>> In some cases they are already carrying the support as per the link
> >>>>> above covering all i2c drivers. I'd like to see a generic solution and
> >>>>> I suspect pushing it to the device drivers rather than the bus code
> >>>>> will explode badly and leave us with subtle bugs where people don't
> >>>>> realise it is necessary.
> >>>>>
> >>>>> https://lore.kernel.org/all/20250224120608.1769039-1-claudiu.beznea.uj@bp.renesas.com/
> >>>>> is a lot nastier looking than what we have here. I'll review that in a minute
> >>>>> to show that it need not be that bad, but none the less not pleasant.
> >>>>>
> >>>>> +CC linux-iio to join up threads and Dmitry wrt to i2c case (and HID that does
> >>>>> similar)
> >>>>
> >>>> We should not expect individual drivers handle this, because this is a
> >>>> layering violation: they need to know implementation details of the bus
> >>>> code to know if the bus is using non-devres managed resources, and
> >>>> adjust their behavior. Moving this into driver core is also not
> >>>> feasible, as not all buses need it. So IMO this should belong to
> >>>> individual bus code.
> >>>>
> >>>> Instead of using devres group a bus may opt to use
> >>>> devm_add_action_or_reset() and other devm APIs to make sure bus'
> >>>> resource unwinding is carried in the correct order relative to freeing
> >>>> driver-owned resources.
> >>>
> >>> Can you please let us know your input on the approach proposed in this
> >>> patch? Or if you would prefer devm_add_action_or_reset() as suggested by
> >>> Dmitry? Or if you consider another approach would fit better?
> >>>
> >>> Currently there were issues identified with the rzg2l-adc driver (driver
> >>> based solution proposed in [1]) and with the rzg3s thermal driver (solved
> >>> by function rzg3s_thermal_probe() from [2]).
> >>>
> >>> As expressed previously by Jonathan and Dimitry this is a common problem
> >>> and as the issue is due to a call in the bus driver, would be better and
> >>> simpler to handle it in the bus driver. Otherwise, individual drivers would
> >>> have to be adjusted in a similar way.
> >>>
> >>
> >> Rafael,
> >>
> >> Greg suggested we ask for your input on the right option:
> >>
> >> https://lore.kernel.org/all/2025032703-genre-excitable-9473@gregkh/
> >> (that thread has the other option).
> >
> > Can you please let us know your opinion on this?
> Can you please let us know if you have any suggestions for this?
It's been a while since I looked at this. Although as I understand it,
the main issue comes from using devm_pm_runtime_enable().
As I have tried to argue before, I think devm_pm_runtime_enable()
should *not* be used. Not here, not at all. Runtime PM isn't like any
other resources that we fetch/release. Instead, it's a behaviour that
you turn on and off, which needs to be managed more carefully, rather
than relying on fetch/release ordering from devres.
That said, I would convert the driver to use pm_runtime_enable() and
pm_runtime_disable() instead.
Kind regards
Uffe
Powered by blists - more mailing lists