lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFq91JnCFhEuitOJPZtq78-Y3CUy4p0bNE1wK+eYCste6g@mail.gmail.com>
Date: Wed, 12 Feb 2025 16:14:40 +0100
From: Ulf Hansson <ulf.hansson@...aro.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, Linux PM <linux-pm@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, Alan Stern <stern@...land.harvard.edu>, 
	Johan Hovold <johan@...nel.org>, Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>, 
	Jon Hunter <jonathanh@...dia.com>
Subject: Re: [PATCH v1 00/10] PM: Make the core and pm_runtime_force_suspend/resume()
 agree more

On Wed, 12 Feb 2025 at 12:33, Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@...nel.org> wrote:
> >
> > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@...aro.org> wrote:
> > >
> > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@...ysocki.net> wrote:
> > > >
> > > > Hi Everyone,
> > > >
> > > > This series is a result of the discussion on a recently reported issue
> > > > with device runtime PM status propagation during system resume and
> > > > the resulting patches:
> > > >
> > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/
> > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/
> > > >
> > > > Overall, due to restrictions related to pm_runtime_force_suspend() and
> > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE
> > > > setting propagation to the parent of the first device in a dependency
> > > > chain that turned out to have to be resumed during system resume even
> > > > though it was runtime-suspended before system suspend.
> > > >
> > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to
> > > > suspend devices that have never had runtime PM enabled if their runtime
> > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume()
> > > > will skip device whose runtime PM status is currently RPM_ACTIVE.
> > > >
> > > > The purpose of this series is to eliminate the above restrictions and
> > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree
> > > > more with what the core does.
> > >
> > > For my understanding, would you mind elaborating a bit more around the
> > > end-goal with this?
> >
> > The end goal is, of course, full integration of runtime PM with system
> > sleep for all devices.  Eventually.
> >
> > And it is necessary to make the core and
> > pm_runtime_force_suspend|resume() work together better for this
> > purpose.
> >
> > > Are you trying to make adaptations for
> > > pm_runtime_force_suspend|resume() and the PM core, such that drivers
> > > that uses pm_runtime_force_suspend|resume() should be able to cope
> > > with other drivers for child-devices that make use of
> > > DPM_FLAG_SMART_SUSPEND?
> >
> > Yes.
> >
> > This is a more general case, though, when a device that was
> > runtime-suspended before system suspend and is left in suspend during
> > it, needs to be resumed during the system resume that follows.
> >
> > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it
> > cannot happen otherwise, but I think that it is a generally valid
> > case.
> >
> > > If we can make this work, it would enable the propagation of
> > > RPM_ACTIVE in the PM core for more devices, but still not for all,
> > > right?
> >
> > It is all until there is a known case in which it isn't.  So either
> > you know a specific case in which it doesn't work, or I don't see a
> > reason for avoiding it.
> >
> > ATM the only specific case in which it doesn't work is when
> > pm_runtime_force_suspend|resume() are used.
> >
> > > The point is, the other bigger issue that I pointed out in our earlier
> > > discussions; all those devices where their drivers/buses don't cope
> > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent
> > > the PM core from *unconditionally* propagating the RPM_ACTIVE to
> > > parents. I guess this is the best we can do then?
> >
> > OK, what are they?
> >
> > You keep saying that they exist without giving any examples.
>
> To put it more bluntly, I'm not aware of any place other than
> pm_runtime_force_suspend|resume() that can be confused by changing the
> runtime PM status of a device with runtime PM disabled (either
> permanently, or transiently during a system suspend-resume cycle) to
> RPM_ACTIVE, so if there are any such places, I would appreciate
> letting me know what they are.

Well, sorry I thought you were aware. Anyway, I believe you need to do
your own investigation as it's simply too time consuming for me to
find them all. The problem is that it's not just a simple pattern to
search for, so we would need some clever scripting to move forward to
find them.

To start with, we should look for drivers that enable runtime PM, by
calling pm_runtime_enable().

Additionally, in their system suspend callback they should typically
*not* use pm_runtime_suspended(), pm_runtime_status_suspended() or
pm_runtime_active() as that is usually (but no always) indicating that
they got it right. Then there are those that don't have system
suspend/resume callbacks assigned at all (or uses some other subsystem
specific callback for this), but only uses runtime PM.

On top of that, drivers that makes use of
pm_runtime_force_suspend|resume() should be disregarded, which has
reached beyond 300 as this point.

Anyway, here is just one example that I found from a quick search.
drivers/i2c/busses/i2c-qcom-geni.c

>
> Note that rpm_active() could start producing confusing results if the
> runtime PM status of a device with runtime PM disabled is changed from
> RPM_ACTIVE to RPM_SUSPENDED because it will then start to return
> -EACCES instead of 1, but changing the status to RPM_ACTIVE will not
> confuse it the same way.

Trust me, it will cause problems.

Drivers may call pm_runtime_get_sync() to turn on the resources for
the device after the system has resumed, when runtime PM has been
re-enabled for the device by the PM core.

Those calls to pm_runtime_get_sync() will then not end up invoking any
if ->runtime_resume() callbacks for the device since its state is
already RPM_ACTIVE. Hence, the device will remain in a low power state
even if the driver believes it has been powered on. In many cases,
accessing the device (like reading/writing a register) will often just
just hang in these cases, but in worst cases we could end up getting
even more difficult bugs to debug.

Kind regards
Uffe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ