linux-kernel - Re: [PATCH] drvier: usb: dwc3: Fix runtime PM trying to activate child device xxx.dwc3 but parent is not active

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0g9nip2KUs2hoa7yMMAow-WsS-4EYX6FvEbpRFw10C2wQ@mail.gmail.com>
Date: Mon, 1 Sep 2025 21:41:34 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Alan Stern <stern@...land.harvard.edu>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Thinh Nguyen <Thinh.Nguyen@...opsys.com>, 
	ryan zhou <ryanzhou54@...il.com>, Roy Luo <royluo@...gle.com>, 
	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>, 
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH] drvier: usb: dwc3: Fix runtime PM trying to activate
 child device xxx.dwc3 but parent is not active

On Fri, Aug 29, 2025 at 9:58 PM Alan Stern <stern@...land.harvard.edu> wrote:
>
> On Fri, Aug 29, 2025 at 09:23:12PM +0200, Rafael J. Wysocki wrote:
> > On Fri, Aug 29, 2025 at 3:25 AM Alan Stern <stern@...land.harvard.edu> wrote:
> > > It sounds like the real question is how we should deal with an
> > > interrupted system suspend.  Suppose parent device A and child device B
> > > are both in runtime suspend when a system sleep transition begins.  The
> > > PM core invokes the ->suspend callback of B (and let's say the callback
> > > doesn't need to do anything because B is already suspended with the
> > > appropriate wakeup setting).
> > >
> > > But then before the PM core invokes the ->suspend callback of A, the
> > > system sleep transition is cancelled.  So the PM core goes through the
> > > device tree from parents to children, invoking the ->resume callback for
> > > all the devices whose ->suspend callback was called earlier.  Thus, A's
> > > ->resume is skipped because A's ->suspend wasn't called, but B's
> > > ->resume callback _is_ invoked.  This callback fails, because it can't
> > > resume B while A is still in runtime suspend.
> > >
> > > The same problem arises if A isn't a parent of B but there is a PM
> > > dependency from B to A.
> > >
> > > It's been so long since I worked on the system suspend code that I don't
> > > remember how we decided to handle this scenario.
> >
> > We actually have not made any specific decision in that respect.  That
> > is, in the error path, the core will invoke the resume callbacks for
> > devices whose suspend callbacks were invoked and it won't do anything
> > beyond that because it has too little information on what would need
> > to be done.
> >
> > Arguably, though, the failure case described above is not different
> > from regular resume during which the driver of A decides to retain the
> > device in runtime suspend.
> >
> > I'm not sure if the core can do anything about it.
> >
> > But at the time when the B's resume callback is invoked, runtime PM is
> > enabled for A, so the driver of B may as well use runtime_resume() to
> > resume the device if it wants to do so.  It may also decide to do
> > nothing like in the suspend callback.
>
> Good point.  Since both devices were in runtime suspend before the sleep
> transition started, there's no reason they can't remain in runtime
> suspend after the sleep transition is cancelled.
>
> On the other hand, it seems clear that this scenario doesn't get very
> much testing.

No, it doesn't in general AFAICS.

> I'm pretty sure the USB subsystem in general is
> vulnerable to this problem; it doesn't consider suspended devices to be
> in different states according to the reason for the suspend.  That is, a
> USB device suspended for runtime PM is in the same state as a device
> suspended for system PM (aside from minor details like wakeup settings).
> Consequently the ->resume and ->runtime_resume callbacks do essentially
> the same thing, both assuming the parent device is not suspended.  As we
> have discussed, this assumption isn't always correct.
>
> I'm open to suggestions for how to handle this.  Should we keep track of
> whether a device was in runtime suspend when a system suspend happens,
> so that the ->resume callback can avoid doing anything?  Will that work
> if the device was the source of a wakeup request?

Generally speaking, for proper integration of system suspend with
runtime suspend at all levels, it is necessary to track whether or not
the given device has been suspended prior to system suspend.

In fact, there are even ways to opt-in for assistance from the PM core
and bus types in that respect to some extent.

In the particular case at hand though, the PM core is not involved in
making the decision whether or not to leave the devices in runtime
suspend during system suspend and it all depends on the drivers of A
and B.

Note here that the problematic situation occurs when the suspend of B
has run, but the suspend of A has not run yet and the transition is
aborted between them, so the driver of A cannot do much to help.  The
driver of B has a couple of options though.

First off, it might decide to runtime-resume the device in its system
suspend callback (as long as we are talking about the "suspend" phase
and not any later phases of system suspend) before suspending it again
which will also cause A to runtime-resume and aborting system suspend
would not be problematic any more.  So that's one of the options, but
it is kind of wasteful and time-consuming.

Another option, which I mentioned before, might be to call
runtime_resume() from the system resume callback of B (again, as long
as we are talking about the "resume" phase, not any of the earlier
phases of system resume).  This assumes that runtime PM is enabled at
this point for both A and B and so it should work properly.

Now, if the driver of B needs to do something special to the device in
its system suspend callback, it may want (and likely should) disable
runtime PM prior to this and in that case it will have to check what
the runtime PM status of the device is and adjust its actions
accordingly.  That really depends on what those actions are etc, so
I'd rather not talk about it without a specific example.