linux-kernel - Re: [PATCH] USB:Fix ehci infinite suspend-resume loop issue in zhaoxin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Yk7zrRWDwJsMvX6s@rowland.harvard.edu>
Date:   Thu, 7 Apr 2022 10:22:37 -0400
From:   Alan Stern <stern@...land.harvard.edu>
To:     "WeitaoWang-oc@...oxin.com" <WeitaoWang-oc@...oxin.com>
Cc:     gregkh@...uxfoundation.org, linux-usb@...r.kernel.org,
        linux-kernel@...r.kernel.org, CobeChen@...oxin.com,
        TimGuo@...oxin.com, tonywwang@...oxin.com, weitaowang@...oxin.com
Subject: Re: [PATCH] USB:Fix ehci infinite suspend-resume loop issue in
 zhaoxin

On Thu, Apr 07, 2022 at 02:15:29PM +0800, WeitaoWang-oc@...oxin.com wrote:
> On 2022/4/7 00:20, Alan Stern wrote:
> > On Wed, Apr 06, 2022 at 10:38:28AM +0800, WeitaoWang-oc@...oxin.com wrote:
> > > On 2022/4/6 00:02, Alan Stern wrote:
> > > > In fact, the resume kernel doesn't call ehci_resume at all.  Here's what
> > > > it does:
> > > > 
> > > > 	The resume kernel boots;
> > > > 
> > > > 	If your patch causes STS_PCD to be set at this point, the flag
> > > > 	should get cleared shortly afterward by ehci_irq;
> > > > 
> > > > 	ehci-hcd goes into runtime suspend;
> > > > 
> > > > 	The kernel reads the system image that was stored earlier when
> > > > 	hibernation began;
> > > > 
> > > > 	After the image is loaded, the system goes into the freeze
> > > > 	state (this does not call any routines in ehci-hcd);
> > > On this phase, pci_pm_freeze will be called for pci device. In this
> > > function, pm_runtime_resume will be called to resume already
> > > runtime-suspend devices. which will cause ehci_resume to be called.
> > > Thus STS_PCD flag will be set in ehci_resume function.
> > 
> > Aha!  I was missing that piece of information, thanks.
> > 
> > But this still doesn't explain why check_root_hub_suspended is failing.
> > That routine checks the HCD_RH_RUNNING bit, which gets set in
> > hcd_bus_resume.  hcd_bus_resume gets called as part of resuming the root
> > hub, and in ehci-hcd this happens when ehci_irq sees that STS_PCD is set
> > and calls usb_hcd_resume_root_hub.  That routine queues a wakeup request
> > on the pm_wq work queue, which is then supposed to run hcd_resume_work
> > to actually restart the root hub.
> > 
> > But pm_wq is a freezable work queue!  While the system is in the freeze
> > state, the work queue isn't running.  This means that the root hub
> > should remain suspended until the end of the freeze phase, and so the
> > call to check_root_hub_suspended should succeed.
> > 
> > Can you check to see what's really happening on your system?  Something
> > must be wrong with my analysis, but I can't tell what it is.  I'm still
> > puzzled.
> > 
> > Alan Stern
> Your analysis is right, my test platform's kernel version is not the
> latest, this kernel not call freeze_kernel_threads on software_resume
> function.
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/power/hibernate.c?h=v5.18-rc1&id=2351f8d295ed63393190e39c2f7c1fee1a80578f)
> So pm_wq is active and can handle root hub power events.
> Update my kernel to fix the issue in the url above, system hibernation
> test was successful with our patch(not clear STS_PCD bit).
> Thanks for your clarification.

Great!  I'm glad we sorted that out.

So check_root_hub_suspended doesn't need any changes, and the patch you 
already submitted takes care of everything.

Alan Stern