lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 31 Mar 2016 17:36:36 +0300
From:	Mathias Nyman <mathias.nyman@...ux.intel.com>
To:	Rajesh Bhagat <rajesh.bhagat@....com>
CC:	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Sriram Dash <sriram.dash@....com>
Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to
 XHCI commmand timeout

On 31.03.2016 06:51, Rajesh Bhagat wrote:
>
>
>
> Hello Mathias,
>
> Thanks a lot for your support.
>
> Attached patch is working and allows the completion handler to be called. And resume operation completes
> and shell comes back (but with lot of errors).
>
> Now, some other points which needs our attention here are:
> 1. usb core hub code is not checking the error status of hcd->driver->reset_device(xhci_discover_or_reset_device)
>      and continues considering reset_device was successful (and causes other issues). Hence, a check is needed on return
>      value of reset_device and proper action is required on it.
> 2. Even if above return value is checked and reset_device status is used. USB core hub retries for N times which is not
>      required in this case as adding to the resume operation time. But if in this case we return -ENOTCONN instead of -EINVAL
>      from hcd->driver->reset_device(xhci_discover_or_reset_device), code returns early and doesn't retry.
>
> Proposed Solution for above with your suggested patches is as below:
>
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 7cad1fa..efeba31 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -2807,13 +2807,17 @@ done:
>                          struct usb_hcd *hcd = bus_to_hcd(udev->bus);
>
>                          update_devnum(udev, 0);
> -                       /* The xHC may think the device is already reset,
> -                        * so ignore the status.
> +                       /*
> +                        * check the status of resetting the device and update
> +                        * the udev status.
>                           */
>                          if (hcd->driver->reset_device)
> -                               hcd->driver->reset_device(hcd, udev);
> +                               status = hcd->driver->reset_device(hcd, udev);
>
> -                       usb_set_device_state(udev, USB_STATE_DEFAULT);
> +                       if (status == 0)
> +                               usb_set_device_state(udev, USB_STATE_DEFAULT);
> +                       else
> +                               usb_set_device_state(udev, USB_STATE_NOTATTACHED);
>                  }
>          } else {
>                  if (udev)
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index b3a0a22..9427175f 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -310,6 +310,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>                  return -ESHUTDOWN;
>          }
>
> +       /* writing the CMD_RING_ABORT bit should create a command completion
> +        * event, add a command completion timeout for it as well
> +        */
> +       mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT);
>          return 0;
>   }
>
> @@ -1243,6 +1247,7 @@ void xhci_handle_command_timeout(unsigned long data)
>          int ret;
>          unsigned long flags;
>          u64 hw_ring_state;
> +       u32 old_status;
>          struct xhci_command *cur_cmd = NULL;
>          xhci = (struct xhci_hcd *) data;
>
> @@ -1250,6 +1255,7 @@ void xhci_handle_command_timeout(unsigned long data)
>          spin_lock_irqsave(&xhci->lock, flags);
>          if (xhci->current_cmd) {
>                  cur_cmd = xhci->current_cmd;
> +               old_status = cur_cmd->status;
>                  cur_cmd->status = COMP_CMD_ABORT;
>          }
>
> @@ -1272,7 +1278,15 @@ void xhci_handle_command_timeout(unsigned long data)
>          }
>          /* command timeout on stopped ring, ring can't be aborted */
>          xhci_dbg(xhci, "Command timeout on stopped ring\n");
> -       xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
> +
> +       /* is this the second timeout for the same command? ring wont restart */
> +       if (old_status == COMP_CMD_ABORT) {
> +               xhci_err(xhci, "Abort timeout, Fail to restart cmd ring\n");
> +               xhci_cleanup_command_queue(xhci);
> +       /* everything else here to halt, cleanup, free etc kill xhc */
> +       } else
> +               xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
> +
>          spin_unlock_irqrestore(&xhci->lock, flags);
>          return;
>   }
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index c502c22..bd18966 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -3450,7 +3450,7 @@ int xhci_discover_or_reset_device(struct usb_hcd *hcd, struct usb_device *udev)
>                  if (ret == 1)
>                          return 0;
>                  else
> -                       return -EINVAL;
> +                       return -ENOTCONN;     /* Don't retry */
>          }
>
>          if (virt_dev->tt_info)
>
> Sample output after above patch (timer is set as wakeup source):
>
> root@...enix:~# echo mem > /sys/power/state
> PM: Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.001 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> PM: suspend of devices complete after 155.658 msecs
> PM: late suspend of devices complete after 1.594 msecs
> PM: noirq suspend of devices complete after 1.496 msecs
> PM: noirq resume of devices complete after 1.290 msecs
> PM: early resume of devices complete after 1.250 msecs
> usb usb1: root hub lost power or was reset
> usb usb2: root hub lost power or was reset
> xhci-hcd xhci-hcd.0.auto: Abort timeout, Fail to restart cmd ring
> xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
> xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
> xhci-hcd xhci-hcd.0.auto: Abort timeout, Fail to restart cmd ring
> xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
> xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
> PM: resume of devices complete after 25436.366 msecs     <= resume time is increased a lot even after using -ENOTCONN
> Restarting tasks ...
> usb 1-1: USB disconnect, device number 2
> usb 1-1.2: USB disconnect, device number 3
> usb 2-1: USB disconnect, device number 2
> done.
> root@...enix:~#
>
> Please share your opinion on other changes for patch submission as well as resume time.
>

I think more effort should be put into investigating why this happens in the first place.
What is the root cause? why doesn't xhci start properly after resume in this case.

Optimizing resume time and error paths should be secondary here, resuming faster isn't really
a matter when xhci is completely stuck.

-Mathias

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ