[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gwBvC-y0fgWLMCkKdd=wpXs2msf5HCFaXkc1HbRfhNsg@mail.gmail.com>
Date: Fri, 29 Aug 2025 21:23:12 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Alan Stern <stern@...land.harvard.edu>
Cc: Thinh Nguyen <Thinh.Nguyen@...opsys.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
ryan zhou <ryanzhou54@...il.com>, Roy Luo <royluo@...gle.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH] drvier: usb: dwc3: Fix runtime PM trying to activate
child device xxx.dwc3 but parent is not active
On Fri, Aug 29, 2025 at 3:25 AM Alan Stern <stern@...land.harvard.edu> wrote:
>
> On Fri, Aug 29, 2025 at 12:43:17AM +0000, Thinh Nguyen wrote:
> > On Wed, Aug 27, 2025, Rafael J. Wysocki wrote:
> > > On Wed, Aug 27, 2025 at 4:52 PM Alan Stern <stern@...land.harvard.edu> wrote:
> > > >
> > > > Ryan:
> > > >
> > > > You should present your questions to the maintainer of the kernel's
> > > > Power Management subsystem, Rafael Wysocki (added to the To: list for
> > > > this email).
> > >
> > > Thanks Alan!
> > >
> > >
> > > > On Wed, Aug 27, 2025 at 10:09:10PM +0800, ryan zhou wrote:
> > > > > Hi Roy,
> > > > > Thank you for reviewing my patch.
> > > > > >
> > > > > > Wouldn't the parent glue dev already resume before resuming the child dwc3?
> > > > > >
> > > > > No, in the following case, the parent device will not be reviewed
> > > > > before resuming the child device.
> > > > > Taking the 'imx8mp-dwc3' driver as an example.
> > > > > Step 1.usb disconnect trigger: the child device dwc3 enter runtime
> > > > > suspend state firstly, followed by
> > > > > the parent device imx8mp-dwc3 enters runtime suspend
> > > > > flow:dwc3_runtime_suspend->dwc3_imx8mp_runtime_suspend
> > > > > Step2.system deep trigger:consistent with the runtime suspend flow,
> > > > > child enters pm suspend and followed
> > > > > by parent
> > > > > flow: dwc3_pm_suspend->dwc3_imx8mp_pm_suspend
> > > > > Step3: After dwc3_pm_suspend, and before dwc3_imx8mp_pm_suspend, a
> > > > > task terminated the system suspend process
> > > > > . The system will resume from the checkpoint, and resume devices in
> > > > > the suspended state in the reverse
> > > > > of pm suspend, but excluding the parent device imx8mp-dwc3 since it
> > > > > did not execute the suspend process.
> > > > >
> > > > > >
> > > > > >Why would 'runtime PM trying to activate child device xxx.dwc3 but parent is not active' happen in the first place?
> > > > > >
> > > > > Following the above analysis, dwc3_resume calls
> > >
> > > I assume that dwc3_pm_resume() is meant here.
> > >
> > > > > pm_runtime_set_active(dev), it checks the
> > > > > parent.power->runtime_status is not RPM_ACTIVE and outputs the error log.
> > >
> > > And it does so because enabling runtime PM for the child with
> > > runtime_status == RPM_ACTIVE does not make sense when the parent has
> > > runtime PM enabled and its status is not RPM_ACTIVE.
> > >
> > > It looks like the runtime PM status of the parent is not as expected,
> >
> > So is the scenario Ryan brought up unexpected? What are we missing here
> > and where should the fix be in?
> >
> > > but quite frankly I don't quite follow the logic in dwc3_pm_resume().
> > >
> > > Why does it disable runtime PM just for the duration of
> > > dwc3_resume_common()? If runtime PM is functional before the
> > > pm_runtime_disable() call in dwc3_pm_resume(), the device may as well
> > > be resumed by calling pm_runtime_resume() on it without disabling
> > > runtime PM. In turn, if runtime PM is not functional at that point,
> > > it should not be enabled.
> >
> > Base on git-blame, I hope this will answer your question:
> >
> > 68c26fe58182 ("usb: dwc3: set pm runtime active before resume common")
> >
> > For device mode, if PM runtime autosuspend feature enabled, the
> > runtime power status of dwc3 may be suspended when run dwc3_resume(),
> > and dwc3 gadget would not be configured in dwc3_gadget_run_stop().
> > It would cause gadget connected failed if USB cable has been plugged
> > before PM resume. So move forward pm_runtime_set_active() to fix it.
> >
> >
> > In certain platforms, they probably need the phy to be active to perform
> > dwc3_resume_common().
>
> It sounds like the real question is how we should deal with an
> interrupted system suspend. Suppose parent device A and child device B
> are both in runtime suspend when a system sleep transition begins. The
> PM core invokes the ->suspend callback of B (and let's say the callback
> doesn't need to do anything because B is already suspended with the
> appropriate wakeup setting).
>
> But then before the PM core invokes the ->suspend callback of A, the
> system sleep transition is cancelled. So the PM core goes through the
> device tree from parents to children, invoking the ->resume callback for
> all the devices whose ->suspend callback was called earlier. Thus, A's
> ->resume is skipped because A's ->suspend wasn't called, but B's
> ->resume callback _is_ invoked. This callback fails, because it can't
> resume B while A is still in runtime suspend.
>
> The same problem arises if A isn't a parent of B but there is a PM
> dependency from B to A.
>
> It's been so long since I worked on the system suspend code that I don't
> remember how we decided to handle this scenario.
We actually have not made any specific decision in that respect. That
is, in the error path, the core will invoke the resume callbacks for
devices whose suspend callbacks were invoked and it won't do anything
beyond that because it has too little information on what would need
to be done.
Arguably, though, the failure case described above is not different
from regular resume during which the driver of A decides to retain the
device in runtime suspend.
I'm not sure if the core can do anything about it.
But at the time when the B's resume callback is invoked, runtime PM is
enabled for A, so the driver of B may as well use runtime_resume() to
resume the device if it wants to do so. It may also decide to do
nothing like in the suspend callback.
Powered by blists - more mailing lists