lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 1 Mar 2022 17:49:23 +0200
From:   Mathias Nyman <mathias.nyman@...ux.intel.com>
To:     Henry Lin <henryl@...dia.com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>
Cc:     Mathias Nyman <mathias.nyman@...el.com>,
        "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] xhci: fix runtime PM imbalance in USB2 resume

On 1.3.2022 12.28, Henry Lin wrote:
>>> USB2 resume starts with usb_hcd_start_port_resume() in port status
>>> change handling for RESUME link state. usb_hcd_end_port_resume() call is
>>> needed to keep runtime PM balance.
> 
>> For normal usb2 port resume the usb_hcd_end_port_resume() is called when resume
>> has been signaled for long enough in xhci_handle_usb2_port_link_resume().
>>
>> This is also where driver directs the port to go from Resume state to U0.
>> Port can't do this without driver directing it.
>>
>> If there's a failure during resume signaling (disconnect, reset, error) then
>> stale resume variables are detected in xhci_get_port_status() and
>> usb_hcd_end_port_resume() is called.
> 
>> I do now see a231ec41e6f6 ("xhci: refactor U0 link state handling in get_port_status")
>> does change order of checking and clearing stale resume variables, but this should
>> only happen if the first port state we read is a fully enabled functional U0 state after
>> a failed resume.
> 
>> Could you expand a bit how this was detected?
> We observed the racing issue when usb2 device-initiated resume occurs in system resume.
> If usb2 host-initiated resume for system resume directs U0 before xhci_get_usb2_port_status()
> see RESUME state, xhci_get_usb2_port_status() will not finish resume process in
> xhci_handle_usb2_port_link_resume(). Its scenario is as follows:
> 
> 1. System resume starts. All driver system resume callbacks get called in order. XHCI controller
>     is resumed by xhci_resume().
> 2. USB2 root hub is resuming. hcd_bus_resume() is being executed.
> 3. Before xhci_bus_resume() is finished, XHCI driver receives a port status change event for 
>     an USB2 port with RESUME link state in xhci_irq(). XHCI driver starts the process to resume
>     HS port for device-initiated resume.
> 4. In xhci_bus_resume(), host-initiated resume (direct U0) is performed on the same port that is
>     resuming in step 3 in below loop:
> 
>                 if (bus_state->bus_suspended) {
>                         spin_unlock_irqrestore(&xhci->lock, flags);
>                         msleep(USB_RESUME_TIMEOUT);
>                         spin_lock_irqsave(&xhci->lock, flags);
>                 }
>                 for_each_set_bit(port_index, &bus_state->bus_suspended,
>                                  BITS_PER_LONG) {
>                         /* Clear PLC to poll it later for U0 transition */
>                         xhci_test_and_clear_bit(xhci, ports[port_index],
>                                                 PORT_PLC);
>                         xhci_set_link_state(xhci, ports[port_index], XDEV_U0);
>                 }
> 5. Then, link state of the resuming port is observed as U0 in following
>     xhci_get_usb2_port_status(). xhci_handle_usb2_port_link_resume() has
>     no chance to get called on the resuming port.
> 

True, thanks for the explanation.
If there's a race between system resume and device-initiated resume, and port is resumed
in xhci_bus_resume() then yes I see how this could happen.

Maybe only call usb_hcd_end_port_resume() if xhci_irq() detected the device-initiated
resume. i.e. set a value to resume_done[portnum] and called usb_hcd_start_port_resume()
something like: 

@@ -1088,6 +1088,8 @@ static void xhci_get_usb2_port_status(struct xhci_port *port, u32 *status,
                if (link_state == XDEV_U2)
                        *status |= USB_PORT_STAT_L1;
                if (link_state == XDEV_U0) {
+                       if (bus_state->resume_done[portnum])
+                               usb_hcd_end_port_resume(&port->rhub->hcd->self, portnum);
                        bus_state->resume_done[portnum] = 0;
                        clear_bit(portnum, &bus_state->resuming_ports);

Also xhci_bus_resume() only resumes ports that were forcefully suspended to U3 in xhci_bus_suspend().
Could be worth checking why that device wasn't already in U3 when suspend reached xhci_bus_suspend().

Thanks
Mathias

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ