lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56FAB97F.4030206@linux.intel.com>
Date:	Tue, 29 Mar 2016 20:21:03 +0300
From:	Mathias Nyman <mathias.nyman@...ux.intel.com>
To:	Rajesh Bhagat <rajesh.bhagat@....com>
CC:	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Sriram Dash <sriram.dash@....com>
Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to
 XHCI commmand timeout

On 28.03.2016 09:13, Rajesh Bhagat wrote:
>
>
>> -----Original Message-----
>> From: Mathias Nyman [mailto:mathias.nyman@...ux.intel.com]
>> Sent: Wednesday, March 23, 2016 7:52 PM
>> To: Rajesh Bhagat <rajesh.bhagat@....com>
>> Cc: gregkh@...uxfoundation.org; linux-usb@...r.kernel.org; linux-
>> kernel@...r.kernel.org; Sriram Dash <sriram.dash@....com>
>> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
>> commmand timeout
>>
>> On 23.03.2016 05:53, Rajesh Bhagat wrote:
>>
>>>>> IMO, The assumption that "xhci_abort_cmd_ring would always generate
>>>>> an event and handle_cmd_completion would be called" will not be
>>>>> always be true if HW
>>>> is in bad state.
>>>>>
>>>>> Please share your opinion.
>>>>>
>>>>
>>>> writing the CA (command abort) bit in CRCR (command ring control
>>>> register)  will stop the command ring, and CRR (command ring running) will be set
>> to 0 by xHC.
>>>> xhci_abort_cmd_ring() polls this bit up to 5 seconds.
>>>> If it's not 0 then the driver considers the command abort as failed.
>>>>
>>>> The scenario you're thinking of is that xHC would still react to CA
>>>> bit set, it would stop the command ring and set CRR 0, but not send a command
>> completion event.
>>>>
>>>> Have you tried adding some debug to handle_cmd_completion() and see
>>>> if you receive any event after command abortion?
>>>>
>>>
>>> Yes. We have added debug prints at first line of
>>> handle_cmd_completion, and we are not getting those prints. The last
>>> print messages that we get are as below from xhci_alloc_dev while
>>> resume
>>> operation:
>>>
>>> xhci-hcd xhci-hcd.0.auto: Command timeout xhci-hcd xhci-hcd.0.auto:
>>> Abort command ring
>>>
>>> May be somehow, USB controller is in bad state and not responding to the
>> commands.
>>>
>>> Please suggest how XHCI driver can handle such situations.
>>>
>>
>> Restart the command timeout timer when writing the command abort bit.
>> If we get theIf we get the abort event the timer is deleted.
>>
>> Otherwise if the timout triggers a second time we end up calling
>> xhci_handle_command_timeout() with a stopped ring, This will call
>> xhci_handle_stopped_cmd_ring(), turn the aborted command to no-op, restart the
>> command ring, and finally when the no-op completes it should call the missing
>> completion.
>>
>> If command ring doesn't start then additional code could be added to
>> xhci_handle_command_timeout() that clears the command ring if it is called a second
>> time (=current command is already in abort state and command ring is stopped when
>> entering xhci_handle_command_timeout)
>>
>> There might be some details missing, I'm not able to test any of this, but try
>> something like this:
>>
>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index
>> 3e1d24c..576819e 100644
>> --- a/drivers/usb/host/xhci-ring.c
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -319,7 +319,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>>                   xhci_halt(xhci);
>>                   return -ESHUTDOWN;
>>           }
>> -
>> +       /* writing the CMD_RING_ABORT bit should create a command completion
>> +        * event, add a command completion timeout for it as well
>> +        */
>> +       mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT);
>>           return 0;
>>    }
>
> Hello Mathias,
>
> Thanks for the patch.
>
> After application of above patch, I'm getting following prints constantly:
>
> xhci-hcd xhci-hcd.0.auto: Command timeout
> xhci-hcd xhci-hcd.0.auto: Abort command ring
> xhci-hcd xhci-hcd.0.auto: Command timeout on stopped ring
> xhci-hcd xhci-hcd.0.auto: Turn aborted command be56e000 to no-op
> xhci-hcd xhci-hcd.0.auto: // Ding dong!
> ...
> xhci-hcd xhci-hcd.0.auto: Command timeout
> xhci-hcd xhci-hcd.0.auto: Abort command ring
> xhci-hcd xhci-hcd.0.auto: Command timeout on stopped ring
> xhci-hcd xhci-hcd.0.auto: Turn aborted command be56e000 to no-op
> xhci-hcd xhci-hcd.0.auto: // Ding dong!
>
> As expected, xhci_handle_command_timeout is called again and next time ring state
> is __not__ CMD_RING_STATE_RUNNING, Hence xhci_handle_stopped_cmd_ring is called
> which turn all the aborted commands to no-ops and again makes the ring state as
> CMD_RING_STATE_RUNNING, and rings the door bell.
>
> But again in this case, no response from USB controller and xhci_alloc_dev is still waiting for
> wait_for_completion.
>
> Does rescheduling the xhci->cmd_timer ensures command completion will be called. IMO, it is still
> dependent on HW response.
>

It's still dependent on hw starting the command ring, which apparently fails

So additional code is needed that will treat a second or third timeout as
a failed abort attempt, i.e. if a command times out on a stopped ring and the
command status is already aborted, then we cleanup the command ring and kill xhci.

So in addition to the previous added timeout something like this is needed:
(Again, uncompiled, untested semi pseudocode)


diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 7cf6621..d8518e8f 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1253,6 +1253,7 @@ void xhci_handle_command_timeout(unsigned long data)
         int ret;
         unsigned long flags;
         u64 hw_ring_state;
+       u32 old_status;
         struct xhci_command *cur_cmd = NULL;
         xhci = (struct xhci_hcd *) data;
  
@@ -1260,6 +1261,7 @@ void xhci_handle_command_timeout(unsigned long data)
         spin_lock_irqsave(&xhci->lock, flags);
         if (xhci->current_cmd) {
                 cur_cmd = xhci->current_cmd;
+               old_status = cur_cmd->status;
                 cur_cmd->status = COMP_CMD_ABORT;
         }
  
@@ -1282,7 +1284,14 @@ void xhci_handle_command_timeout(unsigned long data)
         }
         /* command timeout on stopped ring, ring can't be aborted */
         xhci_dbg(xhci, "Command timeout on stopped ring\n");
-       xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
+       /* is this the second timeout for the same command? ring wont restart */
+       if (old_status == COMP_CMD_ABORT) {
+               xhci_err(xhci, "Abort timeout, Fail to restart cmd ring\n");
+               xhci_cleanup_command_queue(xhci);
+               /* everything else here to halt, cleanup, free etc kill xhc */
+       } else {
+               xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
+       }
         spin_unlock_irqrestore(&xhci->lock, flags);
         return;
  }

-Mathias

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ