lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250829190731.gx2xrgdz3tor5a2v@synopsys.com>
Date: Fri, 29 Aug 2025 19:07:31 +0000
From: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To: Alan Stern <stern@...land.harvard.edu>,
        "Rafael J. Wysocki"
	<rafael@...nel.org>
CC: Thinh Nguyen <Thinh.Nguyen@...opsys.com>, ryan zhou <ryanzhou54@...il.com>,
        Roy Luo <royluo@...gle.com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH] drvier: usb: dwc3: Fix runtime PM trying to activate
 child device xxx.dwc3 but parent is not active

On Thu, Aug 28, 2025, Alan Stern wrote:
> On Fri, Aug 29, 2025 at 12:43:17AM +0000, Thinh Nguyen wrote:
> > On Wed, Aug 27, 2025, Rafael J. Wysocki wrote:
> > > On Wed, Aug 27, 2025 at 4:52 PM Alan Stern <stern@...land.harvard.edu> wrote:
> > > >
> > > > Ryan:
> > > >
> > > > You should present your questions to the maintainer of the kernel's
> > > > Power Management subsystem, Rafael Wysocki (added to the To: list for
> > > > this email).
> > > 
> > > Thanks Alan!
> > > 
> > > 
> > > > On Wed, Aug 27, 2025 at 10:09:10PM +0800, ryan zhou wrote:
> > > > > Hi Roy,
> > > > > Thank you for reviewing my patch.
> > > > > >
> > > > > > Wouldn't the parent glue dev already resume before resuming the child dwc3?
> > > > > >
> > > > > No, in the following case, the parent device will not be reviewed
> > > > > before resuming the child device.
> > > > > Taking the 'imx8mp-dwc3' driver as an example.
> > > > > Step 1.usb disconnect trigger: the child device dwc3 enter runtime
> > > > > suspend state firstly, followed by
> > > > > the parent device imx8mp-dwc3 enters runtime suspend
> > > > > flow:dwc3_runtime_suspend->dwc3_imx8mp_runtime_suspend
> > > > > Step2.system deep trigger:consistent with the runtime suspend flow,
> > > > > child enters pm suspend and followed
> > > > > by parent
> > > > > flow: dwc3_pm_suspend->dwc3_imx8mp_pm_suspend
> > > > > Step3: After dwc3_pm_suspend, and before dwc3_imx8mp_pm_suspend, a
> > > > > task terminated the system suspend process
> > > > > . The system will resume from the checkpoint, and resume devices in
> > > > > the suspended state in the reverse
> > > > > of pm suspend, but excluding the parent device imx8mp-dwc3 since it
> > > > > did not execute the suspend process.
> > > > >
> > > > > >
> > > > > >Why would 'runtime PM trying to activate child device xxx.dwc3 but parent is not active' happen in the first place?
> > > > > >
> > > > > Following the above analysis, dwc3_resume calls
> > > 
> > > I assume that dwc3_pm_resume() is meant here.
> > > 
> > > > > pm_runtime_set_active(dev), it checks the
> > > > > parent.power->runtime_status is not RPM_ACTIVE and outputs the error log.
> > > 
> > > And it does so because enabling runtime PM for the child with
> > > runtime_status == RPM_ACTIVE does not make sense when the parent has
> > > runtime PM enabled and its status is not RPM_ACTIVE.
> > > 
> > > It looks like the runtime PM status of the parent is not as expected,
> > 
> > So is the scenario Ryan brought up unexpected? What are we missing here
> > and where should the fix be in?
> > 
> > > but quite frankly I don't quite follow the logic in dwc3_pm_resume().
> > > 
> > > Why does it disable runtime PM just for the duration of
> > > dwc3_resume_common()?  If runtime PM is functional before the
> > > pm_runtime_disable() call in dwc3_pm_resume(), the device may as well
> > > be resumed by calling pm_runtime_resume() on it without disabling
> > > runtime PM.  In turn, if runtime PM is not functional at that point,
> > > it should not be enabled.
> > 
> > Base on git-blame, I hope this will answer your question:
> > 
> >     68c26fe58182 ("usb: dwc3: set pm runtime active before resume common")
> > 
> >     For device mode, if PM runtime autosuspend feature enabled, the
> >     runtime power status of dwc3 may be suspended when run dwc3_resume(),
> >     and dwc3 gadget would not be configured in dwc3_gadget_run_stop().
> >     It would cause gadget connected failed if USB cable has been plugged
> >     before PM resume. So move forward pm_runtime_set_active() to fix it.
> > 
> > 
> > In certain platforms, they probably need the phy to be active to perform
> > dwc3_resume_common().
> 
> It sounds like the real question is how we should deal with an 
> interrupted system suspend.  Suppose parent device A and child device B 
> are both in runtime suspend when a system sleep transition begins.  The 
> PM core invokes the ->suspend callback of B (and let's say the callback 
> doesn't need to do anything because B is already suspended with the 
> appropriate wakeup setting).
> 
> But then before the PM core invokes the ->suspend callback of A, the 
> system sleep transition is cancelled.  So the PM core goes through the 
> device tree from parents to children, invoking the ->resume callback for 
> all the devices whose ->suspend callback was called earlier.  Thus, A's 
> ->resume is skipped because A's ->suspend wasn't called, but B's 
> ->resume callback _is_ invoked.  This callback fails, because it can't 
> resume B while A is still in runtime suspend.
> 
> The same problem arises if A isn't a parent of B but there is a PM 
> dependency from B to A.
> 
> It's been so long since I worked on the system suspend code that I don't 
> remember how we decided to handle this scenario.
> 

Alan, Rafael,

What are your thoughts on how we should handle this. Should the fix be
at the PM core? Sounds like the PM core needs to check more than whether
the ->suspend callback was called earlier to determine whether to skip
->resume.

Thanks,
Thinh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ