[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f8965cfe-de9a-439c-84e3-63da066aa74f@rowland.harvard.edu>
Date: Fri, 29 Aug 2025 15:58:45 -0400
From: Alan Stern <stern@...land.harvard.edu>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
ryan zhou <ryanzhou54@...il.com>, Roy Luo <royluo@...gle.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH] drvier: usb: dwc3: Fix runtime PM trying to activate
child device xxx.dwc3 but parent is not active
On Fri, Aug 29, 2025 at 09:23:12PM +0200, Rafael J. Wysocki wrote:
> On Fri, Aug 29, 2025 at 3:25 AM Alan Stern <stern@...land.harvard.edu> wrote:
> > It sounds like the real question is how we should deal with an
> > interrupted system suspend. Suppose parent device A and child device B
> > are both in runtime suspend when a system sleep transition begins. The
> > PM core invokes the ->suspend callback of B (and let's say the callback
> > doesn't need to do anything because B is already suspended with the
> > appropriate wakeup setting).
> >
> > But then before the PM core invokes the ->suspend callback of A, the
> > system sleep transition is cancelled. So the PM core goes through the
> > device tree from parents to children, invoking the ->resume callback for
> > all the devices whose ->suspend callback was called earlier. Thus, A's
> > ->resume is skipped because A's ->suspend wasn't called, but B's
> > ->resume callback _is_ invoked. This callback fails, because it can't
> > resume B while A is still in runtime suspend.
> >
> > The same problem arises if A isn't a parent of B but there is a PM
> > dependency from B to A.
> >
> > It's been so long since I worked on the system suspend code that I don't
> > remember how we decided to handle this scenario.
>
> We actually have not made any specific decision in that respect. That
> is, in the error path, the core will invoke the resume callbacks for
> devices whose suspend callbacks were invoked and it won't do anything
> beyond that because it has too little information on what would need
> to be done.
>
> Arguably, though, the failure case described above is not different
> from regular resume during which the driver of A decides to retain the
> device in runtime suspend.
>
> I'm not sure if the core can do anything about it.
>
> But at the time when the B's resume callback is invoked, runtime PM is
> enabled for A, so the driver of B may as well use runtime_resume() to
> resume the device if it wants to do so. It may also decide to do
> nothing like in the suspend callback.
Good point. Since both devices were in runtime suspend before the sleep
transition started, there's no reason they can't remain in runtime
suspend after the sleep transition is cancelled.
On the other hand, it seems clear that this scenario doesn't get very
much testing. I'm pretty sure the USB subsystem in general is
vulnerable to this problem; it doesn't consider suspended devices to be
in different states according to the reason for the suspend. That is, a
USB device suspended for runtime PM is in the same state as a device
suspended for system PM (aside from minor details like wakeup settings).
Consequently the ->resume and ->runtime_resume callbacks do essentially
the same thing, both assuming the parent device is not suspended. As we
have discussed, this assumption isn't always correct.
I'm open to suggestions for how to handle this. Should we keep track of
whether a device was in runtime suspend when a system suspend happens,
so that the ->resume callback can avoid doing anything? Will that work
if the device was the source of a wakeup request?
Alan Stern
Powered by blists - more mailing lists