lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gzFWW6roYTjUFeL2Tt8kKJ_g5Q=tp2=s87dy05x-Hvww@mail.gmail.com>
Date: Mon, 1 Sep 2025 22:40:25 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Alan Stern <stern@...land.harvard.edu>
Cc: Thinh Nguyen <Thinh.Nguyen@...opsys.com>, ryan zhou <ryanzhou54@...il.com>, 
	Roy Luo <royluo@...gle.com>, 
	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>, 
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH] drvier: usb: dwc3: Fix runtime PM trying to activate
 child device xxx.dwc3 but parent is not active

On Mon, Sep 1, 2025 at 9:41 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Fri, Aug 29, 2025 at 9:58 PM Alan Stern <stern@...land.harvard.edu> wrote:
> >
> > On Fri, Aug 29, 2025 at 09:23:12PM +0200, Rafael J. Wysocki wrote:
> > > On Fri, Aug 29, 2025 at 3:25 AM Alan Stern <stern@...land.harvard.edu> wrote:
> > > > It sounds like the real question is how we should deal with an
> > > > interrupted system suspend.  Suppose parent device A and child device B
> > > > are both in runtime suspend when a system sleep transition begins.  The
> > > > PM core invokes the ->suspend callback of B (and let's say the callback
> > > > doesn't need to do anything because B is already suspended with the
> > > > appropriate wakeup setting).
> > > >
> > > > But then before the PM core invokes the ->suspend callback of A, the
> > > > system sleep transition is cancelled.  So the PM core goes through the
> > > > device tree from parents to children, invoking the ->resume callback for
> > > > all the devices whose ->suspend callback was called earlier.  Thus, A's
> > > > ->resume is skipped because A's ->suspend wasn't called, but B's
> > > > ->resume callback _is_ invoked.  This callback fails, because it can't
> > > > resume B while A is still in runtime suspend.
> > > >
> > > > The same problem arises if A isn't a parent of B but there is a PM
> > > > dependency from B to A.
> > > >
> > > > It's been so long since I worked on the system suspend code that I don't
> > > > remember how we decided to handle this scenario.
> > >
> > > We actually have not made any specific decision in that respect.  That
> > > is, in the error path, the core will invoke the resume callbacks for
> > > devices whose suspend callbacks were invoked and it won't do anything
> > > beyond that because it has too little information on what would need
> > > to be done.
> > >
> > > Arguably, though, the failure case described above is not different
> > > from regular resume during which the driver of A decides to retain the
> > > device in runtime suspend.
> > >
> > > I'm not sure if the core can do anything about it.
> > >
> > > But at the time when the B's resume callback is invoked, runtime PM is
> > > enabled for A, so the driver of B may as well use runtime_resume() to
> > > resume the device if it wants to do so.  It may also decide to do
> > > nothing like in the suspend callback.
> >
> > Good point.  Since both devices were in runtime suspend before the sleep
> > transition started, there's no reason they can't remain in runtime
> > suspend after the sleep transition is cancelled.
> >
> > On the other hand, it seems clear that this scenario doesn't get very
> > much testing.
>
> No, it doesn't in general AFAICS.
>
> > I'm pretty sure the USB subsystem in general is
> > vulnerable to this problem; it doesn't consider suspended devices to be
> > in different states according to the reason for the suspend.  That is, a
> > USB device suspended for runtime PM is in the same state as a device
> > suspended for system PM (aside from minor details like wakeup settings).
> > Consequently the ->resume and ->runtime_resume callbacks do essentially
> > the same thing, both assuming the parent device is not suspended.  As we
> > have discussed, this assumption isn't always correct.
> >
> > I'm open to suggestions for how to handle this.  Should we keep track of
> > whether a device was in runtime suspend when a system suspend happens,
> > so that the ->resume callback can avoid doing anything?  Will that work
> > if the device was the source of a wakeup request?
>
> Generally speaking, for proper integration of system suspend with
> runtime suspend at all levels, it is necessary to track whether or not
> the given device has been suspended prior to system suspend.
>
> In fact, there are even ways to opt-in for assistance from the PM core
> and bus types in that respect to some extent.
>
> In the particular case at hand though, the PM core is not involved in
> making the decision whether or not to leave the devices in runtime
> suspend during system suspend and it all depends on the drivers of A
> and B.
>
> Note here that the problematic situation occurs when the suspend of B
> has run, but the suspend of A has not run yet and the transition is
> aborted between them, so the driver of A cannot do much to help.  The
> driver of B has a couple of options though.
>
> First off, it might decide to runtime-resume the device in its system
> suspend callback (as long as we are talking about the "suspend" phase
> and not any later phases of system suspend) before suspending it again
> which will also cause A to runtime-resume and aborting system suspend
> would not be problematic any more.  So that's one of the options, but
> it is kind of wasteful and time-consuming.
>
> Another option, which I mentioned before, might be to call
> runtime_resume() from the system resume callback of B (again, as long
> as we are talking about the "resume" phase, not any of the earlier
> phases of system resume).  This assumes that runtime PM is enabled at
> this point for both A and B and so it should work properly.
>
> Now, if the driver of B needs to do something special to the device in
> its system suspend callback, it may want (and likely should) disable
> runtime PM prior to this and in that case it will have to check what
> the runtime PM status of the device is and adjust its actions
> accordingly.  That really depends on what those actions are etc, so
> I'd rather not talk about it without a specific example.

Of course, the driver of B may also choose to leave the device in
runtime suspend in its system resume callback.  This requires checking
the runtime PM status of the device upfront, but the driver needs to
do that anyway in order to leave the device in runtime suspend during
system suspend, so it can record the fact that the device has been
left in runtime suspend.  That record can be used later during system
resume.

The kind of tricky aspect of this is when the device triggers a system
wakeup by generating a wakeup signal.  In that case, it is probably
better to resume it during system resume, but the driver should know
that it is the case (it has access to the device's registers after
all).  It may, for example, use runtime_resume() for resuming the
device (and its parent etc) then.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ