lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250330163129.02f24afb@jic23-huawei>
Date: Sun, 30 Mar 2025 16:31:29 +0100
From: Jonathan Cameron <jic23@...nel.org>
To: Claudiu Beznea <claudiu.beznea@...on.dev>, "Rafael J. Wysocki"
 <rafael@...nel.org>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>, dakr@...nel.org,
 ulf.hansson@...aro.org, linux-kernel@...r.kernel.org,
 linux-pm@...r.kernel.org, linux-renesas-soc@...r.kernel.org,
 geert@...ux-m68k.org, Claudiu Beznea <claudiu.beznea.uj@...renesas.com>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-iio@...r.kernel.org,
 Dmitry Torokhov <dmitry.torokhov@...il.com>
Subject: Re: [PATCH] driver core: platform: Use devres group to free driver
 probe resources

On Thu, 27 Mar 2025 18:47:53 +0200
Claudiu Beznea <claudiu.beznea@...on.dev> wrote:

> Hi, Rafael,
> 
> On 06.03.2025 08:11, Dmitry Torokhov wrote:
> > On Wed, Mar 05, 2025 at 02:03:09PM +0000, Jonathan Cameron wrote:  
> >> On Wed, 19 Feb 2025 14:45:07 +0200
> >> Claudiu Beznea <claudiu.beznea@...on.dev> wrote:
> >>  
> >>> Hi, Daniel, Jonathan,
> >>>
> >>> On 15.02.2025 15:51, Claudiu Beznea wrote:  
> >>>> Hi, Greg,
> >>>>
> >>>> On 15.02.2025 15:25, Greg KH wrote:    
> >>>>> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote:    
> >>>>>> From: Claudiu Beznea <claudiu.beznea.uj@...renesas.com>
> >>>>>>
> >>>>>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
> >>>>>> clocks are managed through PM domains. These PM domains, registered on
> >>>>>> behalf of the clock controller driver, are configured with
> >>>>>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
> >>>>>> clocks are enabled/disabled using runtime PM APIs. The power domains may
> >>>>>> also have power_on/power_off support implemented. After the device PM
> >>>>>> domain is powered off any CPU accesses to these domains leads to system
> >>>>>> aborts.
> >>>>>>
> >>>>>> During probe, devices are attached to the PM domain controlling their
> >>>>>> clocks and power. Similarly, during removal, devices are detached from the
> >>>>>> PM domain.
> >>>>>>
> >>>>>> The detachment call stack is as follows:
> >>>>>>
> >>>>>> device_driver_detach() ->
> >>>>>>   device_release_driver_internal() ->
> >>>>>>     __device_release_driver() ->
> >>>>>>       device_remove() ->
> >>>>>>         platform_remove() ->
> >>>>>> 	  dev_pm_domain_detach()
> >>>>>>
> >>>>>> During driver unbind, after the device is detached from its PM domain,
> >>>>>> the device_unbind_cleanup() function is called, which subsequently invokes
> >>>>>> devres_release_all(). This function handles devres resource cleanup.
> >>>>>>
> >>>>>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the
> >>>>>> cleanup process triggers the action or reset function for disabling runtime
> >>>>>> PM. This function is pm_runtime_disable_action(), which leads to the
> >>>>>> following call stack of interest when called:
> >>>>>>
> >>>>>> pm_runtime_disable_action() ->
> >>>>>>   pm_runtime_dont_use_autosuspend() ->
> >>>>>>     __pm_runtime_use_autosuspend() ->
> >>>>>>       update_autosuspend() ->
> >>>>>>         rpm_idle()
> >>>>>>
> >>>>>> The rpm_idle() function attempts to resume the device at runtime. However,
> >>>>>> at the point it is called, the device is no longer part of a PM domain
> >>>>>> (which manages clocks and power states). If the driver implements its own
> >>>>>> runtime PM APIs for specific functionalities - such as the rzg2l_adc
> >>>>>> driver - while also relying on the power domain subsystem for power
> >>>>>> management, rpm_idle() will invoke the driver's runtime PM API. However,
> >>>>>> since the device is no longer part of a PM domain at this point, the PM
> >>>>>> domain's runtime PM APIs will not be called. This leads to system aborts on
> >>>>>> Renesas SoCs.
> >>>>>>
> >>>>>> Another identified case is when a subsystem performs various cleanups
> >>>>>> using device_unbind_cleanup(), calling driver-specific APIs in the process.
> >>>>>> A known example is the thermal subsystem, which may call driver-specific
> >>>>>> APIs to disable the thermal device. The relevant call stack in this case
> >>>>>> is:
> >>>>>>
> >>>>>> device_driver_detach() ->
> >>>>>>   device_release_driver_internal() ->
> >>>>>>     device_unbind_cleanup() ->
> >>>>>>       devres_release_all() ->
> >>>>>>         devm_thermal_of_zone_release() ->
> >>>>>> 	  thermal_zone_device_disable() ->
> >>>>>> 	    thermal_zone_device_set_mode() ->
> >>>>>> 	      struct thermal_zone_device_ops::change_mode()
> >>>>>>
> >>>>>> At the moment the driver-specific change_mode() API is called, the device
> >>>>>> is no longer part of its PM domain. Accessing its registers without proper
> >>>>>> power management leads to system aborts.
> >>>>>>
> >>>>>> Open a devres group before calling the driver probe, and close it
> >>>>>> immediately after the driver remove function is called and before
> >>>>>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or
> >>>>>> reset functions are executed immediately after the driver remove function
> >>>>>> completes. Additionally, it prevents driver-specific runtime PM APIs from
> >>>>>> being called when the device is no longer part of its power domain.
> >>>>>>
> >>>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@...renesas.com>
> >>>>>> ---
> >>>>>>
> >>>>>> Hi,  
> >>
> >> Hi Claudiu, Greg,
> >>
> >> Sorry, I missed this thread whilst travelling and only saw it because
> >> of reference from the in driver solution.
> >>  
> >>>>>>
> >>>>>> Although Ulf gave its green light for the approaches on both IIO [1],
> >>>>>> [2] and thermal subsystems [3], Jonathan considered unacceptable the
> >>>>>> approaches in [1], [2] as he considered it may lead to dificult to
> >>>>>> maintain code and code opened to subtle bugs (due to the potential of
> >>>>>> mixing devres and non-devres calls). He pointed out a similar approach
> >>>>>> that was done for the I2C bus [4], [5].
> >>>>>>
> >>>>>> As the discussions in [1], [2] stopped w/o a clear conclusion, this
> >>>>>> patch tries to revive it by proposing a similar approach that was done
> >>>>>> for the I2C bus.
> >>>>>>
> >>>>>> Please let me know you input.    
> >>>>>
> >>>>> I'm with Jonathan here, the devres stuff is getting crazy here and you
> >>>>> have drivers mixing them and side affects happening and lots of
> >>>>> confusion.  Your change here is only going to make it even more
> >>>>> confusing, and shouldn't actually solve it for other busses (i.e. what
> >>>>> about iio devices NOT on the platform bus?)    
> >>
> >> In some cases they are already carrying the support as per the link
> >> above covering all i2c drivers.  I'd like to see a generic solution and
> >> I suspect pushing it to the device drivers rather than the bus code
> >> will explode badly and leave us with subtle bugs where people don't
> >> realise it is necessary. 
> >>
> >> https://lore.kernel.org/all/20250224120608.1769039-1-claudiu.beznea.uj@bp.renesas.com/
> >> is a lot nastier looking than what we have here. I'll review that in a minute
> >> to show that it need not be that bad, but none the less not pleasant.
> >>
> >> +CC linux-iio to join up threads and Dmitry wrt to i2c case (and HID that does
> >> similar)  
> > 
> > We should not expect individual drivers handle this, because this is a
> > layering violation: they need to know implementation details of the bus
> > code to know if the bus is using non-devres managed resources, and
> > adjust their behavior. Moving this into driver core is also not
> > feasible, as not all buses need it. So IMO this should belong to
> > individual bus code.
> > 
> > Instead of using devres group a bus may opt to use
> > devm_add_action_or_reset() and other devm APIs to make sure bus'
> > resource unwinding is carried in the correct order relative to freeing
> > driver-owned resources.  
> 
> Can you please let us know your input on the approach proposed in this
> patch? Or if you would prefer devm_add_action_or_reset() as suggested by
> Dmitry? Or if you consider another approach would fit better?
> 
> Currently there were issues identified with the rzg2l-adc driver (driver
> based solution proposed in [1]) and with the rzg3s thermal driver (solved
> by function rzg3s_thermal_probe() from [2]).
> 
> As expressed previously by Jonathan and Dimitry this is a common problem
> and as the issue is due to a call in the bus driver, would be better and
> simpler to handle it in the bus driver. Otherwise, individual drivers would
> have to be adjusted in a similar way.
> 

Rafael,

Greg suggested we ask for your input on the right option:

https://lore.kernel.org/all/2025032703-genre-excitable-9473@gregkh/
(that thread has the other option).

Jonathan

> Thank you,
> Claudiu
> 
> [1]
> https://lore.kernel.org/all/20250324122627.32336-2-claudiu.beznea.uj@bp.renesas.com/
> [2]
> https://lore.kernel.org/all/20250324135701.179827-3-claudiu.beznea.uj@bp.renesas.com/
> 
> >   
> >>  
> >>>>
> >>>> You're right, other busses will still have this problem.
> >>>>     
> >>>>>
> >>>>> Why can't your individual driver handle this instead?    
> >>
> >> In my mind because it's the bus code that is doing the unexpected part by
> >> making calls in the remove path that are effectively not in the same order
> >> as probe because they occur between driver remove and related devres cleanup
> >> for stuff registered in probe.
> >>  
> >>>>
> >>>> Initially I tried it at the driver level by using non-devres PM runtime
> >>>> enable API but wasn't considered OK by all parties.
> >>>>
> >>>> I haven't thought about having devres_open_group()/devres_close_group() in
> >>>> the driver itself but it should work.    
> >>>
> >>> Are you OK with having the devres_open_group()/devres_close_group() in the
> >>> currently known affected drivers (drivers/iio/adc/rzg2l_adc.c and the
> >>> proposed drivers/thermal/renesas/rzg3s_thermal.c [1]) ?  
> >>
> >> I guess it may be the best of a bunch of not particularly nasty solutions...  
> > 
> > We need to update _ALL_ platform drivers using devm then, and this is
> > clearly not scalable.
> > 
> > Thanks.
> >   
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ