linux-kernel - Re: [PATCH] USB:Fix ehci infinite suspend-resume loop issue in zhaoxin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bd43807d-a2d7-5742-4253-c443cdf5c2f0@zhaoxin.com>
Date:   Thu, 7 Apr 2022 14:15:29 +0800
From:   "WeitaoWang-oc@...oxin.com" <WeitaoWang-oc@...oxin.com>
To:     Alan Stern <stern@...land.harvard.edu>
CC:     <gregkh@...uxfoundation.org>, <linux-usb@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <CobeChen@...oxin.com>,
        <TimGuo@...oxin.com>, <tonywwang@...oxin.com>,
        <weitaowang@...oxin.com>
Subject: Re: [PATCH] USB:Fix ehci infinite suspend-resume loop issue in
 zhaoxin

On 2022/4/7 00:20, Alan Stern wrote:
> On Wed, Apr 06, 2022 at 10:38:28AM +0800, WeitaoWang-oc@...oxin.com wrote:
>> On 2022/4/6 00:02, Alan Stern wrote:
>>> In fact, the resume kernel doesn't call ehci_resume at all.  Here's what
>>> it does:
>>>
>>> 	The resume kernel boots;
>>>
>>> 	If your patch causes STS_PCD to be set at this point, the flag
>>> 	should get cleared shortly afterward by ehci_irq;
>>>
>>> 	ehci-hcd goes into runtime suspend;
>>>
>>> 	The kernel reads the system image that was stored earlier when
>>> 	hibernation began;
>>>
>>> 	After the image is loaded, the system goes into the freeze
>>> 	state (this does not call any routines in ehci-hcd);
>> On this phase, pci_pm_freeze will be called for pci device. In this
>> function, pm_runtime_resume will be called to resume already
>> runtime-suspend devices. which will cause ehci_resume to be called.
>> Thus STS_PCD flag will be set in ehci_resume function.
> 
> Aha!  I was missing that piece of information, thanks.
> 
> But this still doesn't explain why check_root_hub_suspended is failing.
> That routine checks the HCD_RH_RUNNING bit, which gets set in
> hcd_bus_resume.  hcd_bus_resume gets called as part of resuming the root
> hub, and in ehci-hcd this happens when ehci_irq sees that STS_PCD is set
> and calls usb_hcd_resume_root_hub.  That routine queues a wakeup request
> on the pm_wq work queue, which is then supposed to run hcd_resume_work
> to actually restart the root hub.
> 
> But pm_wq is a freezable work queue!  While the system is in the freeze
> state, the work queue isn't running.  This means that the root hub
> should remain suspended until the end of the freeze phase, and so the
> call to check_root_hub_suspended should succeed.
> 
> Can you check to see what's really happening on your system?  Something
> must be wrong with my analysis, but I can't tell what it is.  I'm still
> puzzled.
> 
> Alan Stern
Your analysis is right, my test platform's kernel version is not the
latest, this kernel not call freeze_kernel_threads on software_resume
function.
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/power/hibernate.c?h=v5.18-rc1&id=2351f8d295ed63393190e39c2f7c1fee1a80578f)
So pm_wq is active and can handle root hub power events.
Update my kernel to fix the issue in the url above, system hibernation
test was successful with our patch(not clear STS_PCD bit).
Thanks for your clarification.

Weitao Wang