[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <HE1PR0401MB2028787DE01C0C8E79520414E39A0@HE1PR0401MB2028.eurprd04.prod.outlook.com>
Date: Fri, 1 Apr 2016 03:55:11 +0000
From: Rajesh Bhagat <rajesh.bhagat@....com>
To: Mathias Nyman <mathias.nyman@...ux.intel.com>
CC: "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Sriram Dash <sriram.dash@....com>
Subject: RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
commmand timeout
> -----Original Message-----
> From: Mathias Nyman [mailto:mathias.nyman@...ux.intel.com]
> Sent: Thursday, March 31, 2016 8:07 PM
> To: Rajesh Bhagat <rajesh.bhagat@....com>
> Cc: gregkh@...uxfoundation.org; linux-usb@...r.kernel.org; linux-
> kernel@...r.kernel.org; Sriram Dash <sriram.dash@....com>
> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
> commmand timeout
>
> On 31.03.2016 06:51, Rajesh Bhagat wrote:
> >
> >
> >
> > Hello Mathias,
> >
> > Thanks a lot for your support.
> >
> > Attached patch is working and allows the completion handler to be
> > called. And resume operation completes and shell comes back (but with lot of
> errors).
> >
> > Now, some other points which needs our attention here are:
> > 1. usb core hub code is not checking the error status of hcd->driver-
> >reset_device(xhci_discover_or_reset_device)
> > and continues considering reset_device was successful (and causes other issues).
> Hence, a check is needed on return
> > value of reset_device and proper action is required on it.
> > 2. Even if above return value is checked and reset_device status is used. USB core
> hub retries for N times which is not
> > required in this case as adding to the resume operation time. But if in this case
> we return -ENOTCONN instead of -EINVAL
> > from hcd->driver->reset_device(xhci_discover_or_reset_device), code returns
> early and doesn't retry.
> >
> > Proposed Solution for above with your suggested patches is as below:
> >
> > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index
> > 7cad1fa..efeba31 100644
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -2807,13 +2807,17 @@ done:
> > struct usb_hcd *hcd = bus_to_hcd(udev->bus);
> >
> > update_devnum(udev, 0);
> > - /* The xHC may think the device is already reset,
> > - * so ignore the status.
> > + /*
> > + * check the status of resetting the device and update
> > + * the udev status.
> > */
> > if (hcd->driver->reset_device)
> > - hcd->driver->reset_device(hcd, udev);
> > + status =
> > + hcd->driver->reset_device(hcd, udev);
> >
> > - usb_set_device_state(udev, USB_STATE_DEFAULT);
> > + if (status == 0)
> > + usb_set_device_state(udev, USB_STATE_DEFAULT);
> > + else
> > + usb_set_device_state(udev,
> > + USB_STATE_NOTATTACHED);
> > }
> > } else {
> > if (udev)
> > diff --git a/drivers/usb/host/xhci-ring.c
> > b/drivers/usb/host/xhci-ring.c index b3a0a22..9427175f 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -310,6 +310,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
> > return -ESHUTDOWN;
> > }
> >
> > + /* writing the CMD_RING_ABORT bit should create a command completion
> > + * event, add a command completion timeout for it as well
> > + */
> > + mod_timer(&xhci->cmd_timer, jiffies +
> > + XHCI_CMD_DEFAULT_TIMEOUT);
> > return 0;
> > }
> >
> > @@ -1243,6 +1247,7 @@ void xhci_handle_command_timeout(unsigned long data)
> > int ret;
> > unsigned long flags;
> > u64 hw_ring_state;
> > + u32 old_status;
> > struct xhci_command *cur_cmd = NULL;
> > xhci = (struct xhci_hcd *) data;
> >
> > @@ -1250,6 +1255,7 @@ void xhci_handle_command_timeout(unsigned long data)
> > spin_lock_irqsave(&xhci->lock, flags);
> > if (xhci->current_cmd) {
> > cur_cmd = xhci->current_cmd;
> > + old_status = cur_cmd->status;
> > cur_cmd->status = COMP_CMD_ABORT;
> > }
> >
> > @@ -1272,7 +1278,15 @@ void xhci_handle_command_timeout(unsigned long
> data)
> > }
> > /* command timeout on stopped ring, ring can't be aborted */
> > xhci_dbg(xhci, "Command timeout on stopped ring\n");
> > - xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
> > +
> > + /* is this the second timeout for the same command? ring wont restart */
> > + if (old_status == COMP_CMD_ABORT) {
> > + xhci_err(xhci, "Abort timeout, Fail to restart cmd ring\n");
> > + xhci_cleanup_command_queue(xhci);
> > + /* everything else here to halt, cleanup, free etc kill xhc */
> > + } else
> > + xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
> > +
> > spin_unlock_irqrestore(&xhci->lock, flags);
> > return;
> > }
> > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index
> > c502c22..bd18966 100644
> > --- a/drivers/usb/host/xhci.c
> > +++ b/drivers/usb/host/xhci.c
> > @@ -3450,7 +3450,7 @@ int xhci_discover_or_reset_device(struct usb_hcd *hcd,
> struct usb_device *udev)
> > if (ret == 1)
> > return 0;
> > else
> > - return -EINVAL;
> > + return -ENOTCONN; /* Don't retry */
> > }
> >
> > if (virt_dev->tt_info)
> >
> > Sample output after above patch (timer is set as wakeup source):
> >
> > root@...enix:~# echo mem > /sys/power/state
> > PM: Syncing filesystems ... done.
> > Freezing user space processes ... (elapsed 0.001 seconds) done.
> > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > PM: suspend of devices complete after 155.658 msecs
> > PM: late suspend of devices complete after 1.594 msecs
> > PM: noirq suspend of devices complete after 1.496 msecs
> > PM: noirq resume of devices complete after 1.290 msecs
> > PM: early resume of devices complete after 1.250 msecs usb usb1: root
> > hub lost power or was reset usb usb2: root hub lost power or was reset
> > xhci-hcd xhci-hcd.0.auto: Abort timeout, Fail to restart cmd ring
> > xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
> > xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
> > xhci-hcd xhci-hcd.0.auto: Abort timeout, Fail to restart cmd ring
> > xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
> > xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
> > PM: resume of devices complete after 25436.366 msecs <= resume time is
> increased a lot even after using -ENOTCONN
> > Restarting tasks ...
> > usb 1-1: USB disconnect, device number 2 usb 1-1.2: USB disconnect,
> > device number 3 usb 2-1: USB disconnect, device number 2 done.
> > root@...enix:~#
> >
> > Please share your opinion on other changes for patch submission as well as resume
> time.
> >
>
> I think more effort should be put into investigating why this happens in the first place.
> What is the root cause? why doesn't xhci start properly after resume in this case.
>
Hello Mathias,
I understand your point and would surely debug the root cause of the issue. But implementing the
fallback mechanism in SW, if HW does not respond well seems to a better solution to me.
Just to add, code prior to common implementation of xhci_handle_command_timeout was handling the above
case and was __not__ dependent on HW.
Please comment on the possibility of above fallback mechanism in XHCI SW and any side effects that you
can foresee.
> Optimizing resume time and error paths should be secondary here, resuming faster
> isn't really a matter when xhci is completely stuck.
>
I agree on above point.
- Rajesh Bhagat
> -Mathias
Powered by blists - more mailing lists