[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <95f5923f-7a8f-4947-b588-419525930bcb@tuxon.dev>
Date: Fri, 9 May 2025 14:51:38 +0300
From: Claudiu Beznea <claudiu.beznea@...on.dev>
To: Jonathan Cameron <jic23@...nel.org>, "Rafael J. Wysocki"
<rafael@...nel.org>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>, dakr@...nel.org,
ulf.hansson@...aro.org, linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org, linux-renesas-soc@...r.kernel.org,
geert@...ux-m68k.org, Claudiu Beznea <claudiu.beznea.uj@...renesas.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-iio@...r.kernel.org,
Dmitry Torokhov <dmitry.torokhov@...il.com>
Subject: Re: [PATCH] driver core: platform: Use devres group to free driver
probe resources
Hi, Rafael, Ulf, PM list,
On 09.04.2025 19:12, Claudiu Beznea wrote:
> Hi, Rafael,
>
> On 30.03.2025 18:31, Jonathan Cameron wrote:
>> On Thu, 27 Mar 2025 18:47:53 +0200
>> Claudiu Beznea <claudiu.beznea@...on.dev> wrote:
>>
>>> Hi, Rafael,
>>>
>>> On 06.03.2025 08:11, Dmitry Torokhov wrote:
>>>> On Wed, Mar 05, 2025 at 02:03:09PM +0000, Jonathan Cameron wrote:
>>>>> On Wed, 19 Feb 2025 14:45:07 +0200
>>>>> Claudiu Beznea <claudiu.beznea@...on.dev> wrote:
>>>>>
>>>>>> Hi, Daniel, Jonathan,
>>>>>>
>>>>>> On 15.02.2025 15:51, Claudiu Beznea wrote:
>>>>>>> Hi, Greg,
>>>>>>>
>>>>>>> On 15.02.2025 15:25, Greg KH wrote:
>>>>>>>> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote:
>>>>>>>>> From: Claudiu Beznea <claudiu.beznea.uj@...renesas.com>
>>>>>>>>>
>>>>>>>>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
>>>>>>>>> clocks are managed through PM domains. These PM domains, registered on
>>>>>>>>> behalf of the clock controller driver, are configured with
>>>>>>>>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
>>>>>>>>> clocks are enabled/disabled using runtime PM APIs. The power domains may
>>>>>>>>> also have power_on/power_off support implemented. After the device PM
>>>>>>>>> domain is powered off any CPU accesses to these domains leads to system
>>>>>>>>> aborts.
>>>>>>>>>
>>>>>>>>> During probe, devices are attached to the PM domain controlling their
>>>>>>>>> clocks and power. Similarly, during removal, devices are detached from the
>>>>>>>>> PM domain.
>>>>>>>>>
>>>>>>>>> The detachment call stack is as follows:
>>>>>>>>>
>>>>>>>>> device_driver_detach() ->
>>>>>>>>> device_release_driver_internal() ->
>>>>>>>>> __device_release_driver() ->
>>>>>>>>> device_remove() ->
>>>>>>>>> platform_remove() ->
>>>>>>>>> dev_pm_domain_detach()
>>>>>>>>>
>>>>>>>>> During driver unbind, after the device is detached from its PM domain,
>>>>>>>>> the device_unbind_cleanup() function is called, which subsequently invokes
>>>>>>>>> devres_release_all(). This function handles devres resource cleanup.
>>>>>>>>>
>>>>>>>>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the
>>>>>>>>> cleanup process triggers the action or reset function for disabling runtime
>>>>>>>>> PM. This function is pm_runtime_disable_action(), which leads to the
>>>>>>>>> following call stack of interest when called:
>>>>>>>>>
>>>>>>>>> pm_runtime_disable_action() ->
>>>>>>>>> pm_runtime_dont_use_autosuspend() ->
>>>>>>>>> __pm_runtime_use_autosuspend() ->
>>>>>>>>> update_autosuspend() ->
>>>>>>>>> rpm_idle()
>>>>>>>>>
>>>>>>>>> The rpm_idle() function attempts to resume the device at runtime. However,
>>>>>>>>> at the point it is called, the device is no longer part of a PM domain
>>>>>>>>> (which manages clocks and power states). If the driver implements its own
>>>>>>>>> runtime PM APIs for specific functionalities - such as the rzg2l_adc
>>>>>>>>> driver - while also relying on the power domain subsystem for power
>>>>>>>>> management, rpm_idle() will invoke the driver's runtime PM API. However,
>>>>>>>>> since the device is no longer part of a PM domain at this point, the PM
>>>>>>>>> domain's runtime PM APIs will not be called. This leads to system aborts on
>>>>>>>>> Renesas SoCs.
>>>>>>>>>
>>>>>>>>> Another identified case is when a subsystem performs various cleanups
>>>>>>>>> using device_unbind_cleanup(), calling driver-specific APIs in the process.
>>>>>>>>> A known example is the thermal subsystem, which may call driver-specific
>>>>>>>>> APIs to disable the thermal device. The relevant call stack in this case
>>>>>>>>> is:
>>>>>>>>>
>>>>>>>>> device_driver_detach() ->
>>>>>>>>> device_release_driver_internal() ->
>>>>>>>>> device_unbind_cleanup() ->
>>>>>>>>> devres_release_all() ->
>>>>>>>>> devm_thermal_of_zone_release() ->
>>>>>>>>> thermal_zone_device_disable() ->
>>>>>>>>> thermal_zone_device_set_mode() ->
>>>>>>>>> struct thermal_zone_device_ops::change_mode()
>>>>>>>>>
>>>>>>>>> At the moment the driver-specific change_mode() API is called, the device
>>>>>>>>> is no longer part of its PM domain. Accessing its registers without proper
>>>>>>>>> power management leads to system aborts.
>>>>>>>>>
>>>>>>>>> Open a devres group before calling the driver probe, and close it
>>>>>>>>> immediately after the driver remove function is called and before
>>>>>>>>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or
>>>>>>>>> reset functions are executed immediately after the driver remove function
>>>>>>>>> completes. Additionally, it prevents driver-specific runtime PM APIs from
>>>>>>>>> being called when the device is no longer part of its power domain.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@...renesas.com>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>
>>>>> Hi Claudiu, Greg,
>>>>>
>>>>> Sorry, I missed this thread whilst travelling and only saw it because
>>>>> of reference from the in driver solution.
>>>>>
>>>>>>>>>
>>>>>>>>> Although Ulf gave its green light for the approaches on both IIO [1],
>>>>>>>>> [2] and thermal subsystems [3], Jonathan considered unacceptable the
>>>>>>>>> approaches in [1], [2] as he considered it may lead to dificult to
>>>>>>>>> maintain code and code opened to subtle bugs (due to the potential of
>>>>>>>>> mixing devres and non-devres calls). He pointed out a similar approach
>>>>>>>>> that was done for the I2C bus [4], [5].
>>>>>>>>>
>>>>>>>>> As the discussions in [1], [2] stopped w/o a clear conclusion, this
>>>>>>>>> patch tries to revive it by proposing a similar approach that was done
>>>>>>>>> for the I2C bus.
>>>>>>>>>
>>>>>>>>> Please let me know you input.
>>>>>>>>
>>>>>>>> I'm with Jonathan here, the devres stuff is getting crazy here and you
>>>>>>>> have drivers mixing them and side affects happening and lots of
>>>>>>>> confusion. Your change here is only going to make it even more
>>>>>>>> confusing, and shouldn't actually solve it for other busses (i.e. what
>>>>>>>> about iio devices NOT on the platform bus?)
>>>>>
>>>>> In some cases they are already carrying the support as per the link
>>>>> above covering all i2c drivers. I'd like to see a generic solution and
>>>>> I suspect pushing it to the device drivers rather than the bus code
>>>>> will explode badly and leave us with subtle bugs where people don't
>>>>> realise it is necessary.
>>>>>
>>>>> https://lore.kernel.org/all/20250224120608.1769039-1-claudiu.beznea.uj@bp.renesas.com/
>>>>> is a lot nastier looking than what we have here. I'll review that in a minute
>>>>> to show that it need not be that bad, but none the less not pleasant.
>>>>>
>>>>> +CC linux-iio to join up threads and Dmitry wrt to i2c case (and HID that does
>>>>> similar)
>>>>
>>>> We should not expect individual drivers handle this, because this is a
>>>> layering violation: they need to know implementation details of the bus
>>>> code to know if the bus is using non-devres managed resources, and
>>>> adjust their behavior. Moving this into driver core is also not
>>>> feasible, as not all buses need it. So IMO this should belong to
>>>> individual bus code.
>>>>
>>>> Instead of using devres group a bus may opt to use
>>>> devm_add_action_or_reset() and other devm APIs to make sure bus'
>>>> resource unwinding is carried in the correct order relative to freeing
>>>> driver-owned resources.
>>>
>>> Can you please let us know your input on the approach proposed in this
>>> patch? Or if you would prefer devm_add_action_or_reset() as suggested by
>>> Dmitry? Or if you consider another approach would fit better?
>>>
>>> Currently there were issues identified with the rzg2l-adc driver (driver
>>> based solution proposed in [1]) and with the rzg3s thermal driver (solved
>>> by function rzg3s_thermal_probe() from [2]).
>>>
>>> As expressed previously by Jonathan and Dimitry this is a common problem
>>> and as the issue is due to a call in the bus driver, would be better and
>>> simpler to handle it in the bus driver. Otherwise, individual drivers would
>>> have to be adjusted in a similar way.
>>>
>>
>> Rafael,
>>
>> Greg suggested we ask for your input on the right option:
>>
>> https://lore.kernel.org/all/2025032703-genre-excitable-9473@gregkh/
>> (that thread has the other option).
>
> Can you please let us know your opinion on this?
Can you please let us know if you have any suggestions for this?
Thank you,
Claudiu
Powered by blists - more mailing lists