lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <85105e45-3553-4a8c-b132-3875c4657c4b@rowland.harvard.edu>
Date: Mon, 30 Sep 2024 15:38:32 -0400
From: Alan Stern <stern@...land.harvard.edu>
To: dengjie <dengjie03@...inos.cn>
Cc: rafael@...nel.org, pavel@....cz, len.brown@...el.com,
	gregkh@...uxfoundation.org, linux-usb@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
	xiehongyu1@...inos.cn, duanchenghao@...inos.cn, xiongxin@...inos.cn
Subject: Re: [PATCH v2] USB: Fix the issue of S4 wakeup queisce phase where
 task resumption fails due to USB status

I'm very sorry it has taken so long for me to respond to this...

On Wed, Sep 25, 2024 at 10:50:41AM +0800, dengjie wrote:
> Reproduction of the problem: During the S4 stress test, when a USB device is inserted or
> removed, there is a probability that the S4 wakeup will turn into a reboot.The following
> two points describe how to analyze and locate the problem points:
> 
> 1. During the boot stage when S4 is awakened, after the USB RootHub is initialized,
> it will enter the runtime suspend state. From then on, whenever an xhci port change
> event occurs, it will trigger a remote wakeup request event and add wakeup_work
> to pm_wq, where the subsequent RootHub runtime resume process will be handled by pm_wq.
> 
> xhci runtime suspend flow:
> S4 boot
>    |->xhci init
>        |->register_root_hub
> 	   |->hub_probe
> 	       |->callback = RPM_GET_CALLBACK(dev, runtime_suspend)   /* xhci RootHub runtime suspend */
> 
> xhci runtime resume flow :
> xhci_irq()
>     |->xhci_handle_event()
> 	|->handle_port_status()
>    	    |->if(hcd->state == HC_STATE_SUSPENDED)
> 		 |->usb_hcd_resume_root_hub()
> 		    |->set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags)   /* wakeup pending signal to be set */
>   		    |->queue_work(pm_wq, &hcd->wakeup_work)
> 			|->hcd_resume_work()			       /* hcd->wakeup_work */
> 			    |->usb_remote_wakeup()
> 				|->callback = RPM_GET_CALLBACK(dev, runtime_resume)
> 				    |->usb_runtime_resume()            /* usb runtime resume  */
> 					|->generic_resume()
> 					    |->hcd_bus_resume()
> 						|->clear_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
> 						  /* wakeup pending signal to be clear */
> 
> 2. However, during the quiesce phase of S4 wakeup, freeze_kernel_threads() will freeze this pm_wq,
> and between freeze_kernel_threads() and dpm_suspend_start(), there exists a very time-consuming
> S4 image loading process. This leads to a situation where, if an xhci port change event occurs
> after freeze_kernel_threads(), triggering the wakeup pending signal to be set,but it cannot
> be processed by pm_wq to clear this wakeup_pending bit, it will result in a subsequent
> dpm_suspend_start() where USB suspend_common() detects the wakeup pending signal being
> set and returns an -EBUSY error, interrupting the S4 quiesce process and reverting to a reboot.
> 
> S4 wakeup
>     |->resume_store
> 	|->software_resume()
> 	    |->freeze_kernel_threads()		/* will freeze pm_wq */
> 	    |->load_image_and_restore()
> 		  |->swsusp_read()    	        /* S4 image loading: time-consuming .
> When an xhci port change event occurs at this point, it triggers the wakeup pending signal to be set.
> However, since the pm_wq is in a frozen state, the wakeup_pending bit cannot be cleared.*/
>    		  |->hibernation_restore
> 			|->dpm_suspend_start(PMSG_QUIESCE)
> 			    |->hcd_pci_suspend()
> 				|->suspend_common()
> 				    |->if (do_wakeup && HCD_WAKEUP_PENDING(hcd))  return -EBUSY;

At this point, do_wakeup is supposed to be 0 and so the "return -EBUSY" 
error should not occur.

You can see that this is true by reading choose_wakeup() in 
drivers/usb/core/driver.c.  At the start of the function it says:

	/*
	 * For FREEZE/QUIESCE, disable remote wakeups so no interrupts get
	 * generated.
	 */
	if (msg.event == PM_EVENT_FREEZE || msg.event == PM_EVENT_QUIESCE) {
		w = 0;

and at the end it does:

	udev->do_remote_wakeup = w;

Therefore the problem you are describing should not happen and your 
patch should not be needed.

Now maybe things are't working the way they are supposed to.  If that's 
so then you should submit a patch fixing the code so that it _does_ work 
this way.

For instance, in suspend_common(), do_wakeup is derived from 
device_may_wakeup(rhdev), which is determined by 
rhdev->power.should_wakeup -- see the definition in 
include/linux/pm_wakeup.h.  Maybe this flag isn't getting cleared 
properly.  (In fact, at the moment I don't see where that flag gets set 
or cleared at all...)

> Below is a description of the countermeasures taken to address this issue:
> 1. Considering the restore process that occurs after the quiesce phase during S4 wakeup,
> which essentially resets all root hubs,checking this wakeup pending status in USB suspend_common()
> during the quiesce phase is of little significance and should therefore be filtered out.
> 
> S4 wakeup restore phase
>     |->dpm_resume(PMSG_RESTORE)
> 	|->hcd_pci_restore()
> 	    |->xhci_resume()		       /* reset all root hubs */

The wakeup-pending status is checked only if wakeup is enabled.  And 
during the quiesce phase, wakeup is not supposed to be enabled.  So 
nothing needs to be filtered out.

Alan Stern

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ