lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5bca6dfd-fe03-4c44-acf4-a51673124338@tuxon.dev>
Date: Wed, 9 Apr 2025 19:12:56 +0300
From: Claudiu Beznea <claudiu.beznea@...on.dev>
To: Jonathan Cameron <jic23@...nel.org>, "Rafael J. Wysocki"
 <rafael@...nel.org>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>, dakr@...nel.org,
 ulf.hansson@...aro.org, linux-kernel@...r.kernel.org,
 linux-pm@...r.kernel.org, linux-renesas-soc@...r.kernel.org,
 geert@...ux-m68k.org, Claudiu Beznea <claudiu.beznea.uj@...renesas.com>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-iio@...r.kernel.org,
 Dmitry Torokhov <dmitry.torokhov@...il.com>
Subject: Re: [PATCH] driver core: platform: Use devres group to free driver
 probe resources

Hi, Rafael,

On 30.03.2025 18:31, Jonathan Cameron wrote:
> On Thu, 27 Mar 2025 18:47:53 +0200
> Claudiu Beznea <claudiu.beznea@...on.dev> wrote:
> 
>> Hi, Rafael,
>>
>> On 06.03.2025 08:11, Dmitry Torokhov wrote:
>>> On Wed, Mar 05, 2025 at 02:03:09PM +0000, Jonathan Cameron wrote:  
>>>> On Wed, 19 Feb 2025 14:45:07 +0200
>>>> Claudiu Beznea <claudiu.beznea@...on.dev> wrote:
>>>>  
>>>>> Hi, Daniel, Jonathan,
>>>>>
>>>>> On 15.02.2025 15:51, Claudiu Beznea wrote:  
>>>>>> Hi, Greg,
>>>>>>
>>>>>> On 15.02.2025 15:25, Greg KH wrote:    
>>>>>>> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote:    
>>>>>>>> From: Claudiu Beznea <claudiu.beznea.uj@...renesas.com>
>>>>>>>>
>>>>>>>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
>>>>>>>> clocks are managed through PM domains. These PM domains, registered on
>>>>>>>> behalf of the clock controller driver, are configured with
>>>>>>>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
>>>>>>>> clocks are enabled/disabled using runtime PM APIs. The power domains may
>>>>>>>> also have power_on/power_off support implemented. After the device PM
>>>>>>>> domain is powered off any CPU accesses to these domains leads to system
>>>>>>>> aborts.
>>>>>>>>
>>>>>>>> During probe, devices are attached to the PM domain controlling their
>>>>>>>> clocks and power. Similarly, during removal, devices are detached from the
>>>>>>>> PM domain.
>>>>>>>>
>>>>>>>> The detachment call stack is as follows:
>>>>>>>>
>>>>>>>> device_driver_detach() ->
>>>>>>>>   device_release_driver_internal() ->
>>>>>>>>     __device_release_driver() ->
>>>>>>>>       device_remove() ->
>>>>>>>>         platform_remove() ->
>>>>>>>> 	  dev_pm_domain_detach()
>>>>>>>>
>>>>>>>> During driver unbind, after the device is detached from its PM domain,
>>>>>>>> the device_unbind_cleanup() function is called, which subsequently invokes
>>>>>>>> devres_release_all(). This function handles devres resource cleanup.
>>>>>>>>
>>>>>>>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the
>>>>>>>> cleanup process triggers the action or reset function for disabling runtime
>>>>>>>> PM. This function is pm_runtime_disable_action(), which leads to the
>>>>>>>> following call stack of interest when called:
>>>>>>>>
>>>>>>>> pm_runtime_disable_action() ->
>>>>>>>>   pm_runtime_dont_use_autosuspend() ->
>>>>>>>>     __pm_runtime_use_autosuspend() ->
>>>>>>>>       update_autosuspend() ->
>>>>>>>>         rpm_idle()
>>>>>>>>
>>>>>>>> The rpm_idle() function attempts to resume the device at runtime. However,
>>>>>>>> at the point it is called, the device is no longer part of a PM domain
>>>>>>>> (which manages clocks and power states). If the driver implements its own
>>>>>>>> runtime PM APIs for specific functionalities - such as the rzg2l_adc
>>>>>>>> driver - while also relying on the power domain subsystem for power
>>>>>>>> management, rpm_idle() will invoke the driver's runtime PM API. However,
>>>>>>>> since the device is no longer part of a PM domain at this point, the PM
>>>>>>>> domain's runtime PM APIs will not be called. This leads to system aborts on
>>>>>>>> Renesas SoCs.
>>>>>>>>
>>>>>>>> Another identified case is when a subsystem performs various cleanups
>>>>>>>> using device_unbind_cleanup(), calling driver-specific APIs in the process.
>>>>>>>> A known example is the thermal subsystem, which may call driver-specific
>>>>>>>> APIs to disable the thermal device. The relevant call stack in this case
>>>>>>>> is:
>>>>>>>>
>>>>>>>> device_driver_detach() ->
>>>>>>>>   device_release_driver_internal() ->
>>>>>>>>     device_unbind_cleanup() ->
>>>>>>>>       devres_release_all() ->
>>>>>>>>         devm_thermal_of_zone_release() ->
>>>>>>>> 	  thermal_zone_device_disable() ->
>>>>>>>> 	    thermal_zone_device_set_mode() ->
>>>>>>>> 	      struct thermal_zone_device_ops::change_mode()
>>>>>>>>
>>>>>>>> At the moment the driver-specific change_mode() API is called, the device
>>>>>>>> is no longer part of its PM domain. Accessing its registers without proper
>>>>>>>> power management leads to system aborts.
>>>>>>>>
>>>>>>>> Open a devres group before calling the driver probe, and close it
>>>>>>>> immediately after the driver remove function is called and before
>>>>>>>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or
>>>>>>>> reset functions are executed immediately after the driver remove function
>>>>>>>> completes. Additionally, it prevents driver-specific runtime PM APIs from
>>>>>>>> being called when the device is no longer part of its power domain.
>>>>>>>>
>>>>>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@...renesas.com>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Hi,  
>>>>
>>>> Hi Claudiu, Greg,
>>>>
>>>> Sorry, I missed this thread whilst travelling and only saw it because
>>>> of reference from the in driver solution.
>>>>  
>>>>>>>>
>>>>>>>> Although Ulf gave its green light for the approaches on both IIO [1],
>>>>>>>> [2] and thermal subsystems [3], Jonathan considered unacceptable the
>>>>>>>> approaches in [1], [2] as he considered it may lead to dificult to
>>>>>>>> maintain code and code opened to subtle bugs (due to the potential of
>>>>>>>> mixing devres and non-devres calls). He pointed out a similar approach
>>>>>>>> that was done for the I2C bus [4], [5].
>>>>>>>>
>>>>>>>> As the discussions in [1], [2] stopped w/o a clear conclusion, this
>>>>>>>> patch tries to revive it by proposing a similar approach that was done
>>>>>>>> for the I2C bus.
>>>>>>>>
>>>>>>>> Please let me know you input.    
>>>>>>>
>>>>>>> I'm with Jonathan here, the devres stuff is getting crazy here and you
>>>>>>> have drivers mixing them and side affects happening and lots of
>>>>>>> confusion.  Your change here is only going to make it even more
>>>>>>> confusing, and shouldn't actually solve it for other busses (i.e. what
>>>>>>> about iio devices NOT on the platform bus?)    
>>>>
>>>> In some cases they are already carrying the support as per the link
>>>> above covering all i2c drivers.  I'd like to see a generic solution and
>>>> I suspect pushing it to the device drivers rather than the bus code
>>>> will explode badly and leave us with subtle bugs where people don't
>>>> realise it is necessary. 
>>>>
>>>> https://lore.kernel.org/all/20250224120608.1769039-1-claudiu.beznea.uj@bp.renesas.com/
>>>> is a lot nastier looking than what we have here. I'll review that in a minute
>>>> to show that it need not be that bad, but none the less not pleasant.
>>>>
>>>> +CC linux-iio to join up threads and Dmitry wrt to i2c case (and HID that does
>>>> similar)  
>>>
>>> We should not expect individual drivers handle this, because this is a
>>> layering violation: they need to know implementation details of the bus
>>> code to know if the bus is using non-devres managed resources, and
>>> adjust their behavior. Moving this into driver core is also not
>>> feasible, as not all buses need it. So IMO this should belong to
>>> individual bus code.
>>>
>>> Instead of using devres group a bus may opt to use
>>> devm_add_action_or_reset() and other devm APIs to make sure bus'
>>> resource unwinding is carried in the correct order relative to freeing
>>> driver-owned resources.  
>>
>> Can you please let us know your input on the approach proposed in this
>> patch? Or if you would prefer devm_add_action_or_reset() as suggested by
>> Dmitry? Or if you consider another approach would fit better?
>>
>> Currently there were issues identified with the rzg2l-adc driver (driver
>> based solution proposed in [1]) and with the rzg3s thermal driver (solved
>> by function rzg3s_thermal_probe() from [2]).
>>
>> As expressed previously by Jonathan and Dimitry this is a common problem
>> and as the issue is due to a call in the bus driver, would be better and
>> simpler to handle it in the bus driver. Otherwise, individual drivers would
>> have to be adjusted in a similar way.
>>
> 
> Rafael,
> 
> Greg suggested we ask for your input on the right option:
> 
> https://lore.kernel.org/all/2025032703-genre-excitable-9473@gregkh/
> (that thread has the other option).

Can you please let us know your opinion on this?

Thank you,
Claudiu

> 
> Jonathan
> 
>> Thank you,
>> Claudiu
>>
>> [1]
>> https://lore.kernel.org/all/20250324122627.32336-2-claudiu.beznea.uj@bp.renesas.com/
>> [2]
>> https://lore.kernel.org/all/20250324135701.179827-3-claudiu.beznea.uj@bp.renesas.com/
>>
>>>   
>>>>  
>>>>>>
>>>>>> You're right, other busses will still have this problem.
>>>>>>     
>>>>>>>
>>>>>>> Why can't your individual driver handle this instead?    
>>>>
>>>> In my mind because it's the bus code that is doing the unexpected part by
>>>> making calls in the remove path that are effectively not in the same order
>>>> as probe because they occur between driver remove and related devres cleanup
>>>> for stuff registered in probe.
>>>>  
>>>>>>
>>>>>> Initially I tried it at the driver level by using non-devres PM runtime
>>>>>> enable API but wasn't considered OK by all parties.
>>>>>>
>>>>>> I haven't thought about having devres_open_group()/devres_close_group() in
>>>>>> the driver itself but it should work.    
>>>>>
>>>>> Are you OK with having the devres_open_group()/devres_close_group() in the
>>>>> currently known affected drivers (drivers/iio/adc/rzg2l_adc.c and the
>>>>> proposed drivers/thermal/renesas/rzg3s_thermal.c [1]) ?  
>>>>
>>>> I guess it may be the best of a bunch of not particularly nasty solutions...  
>>>
>>> We need to update _ALL_ platform drivers using devm then, and this is
>>> clearly not scalable.
>>>
>>> Thanks.
>>>   
>>
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ