lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56F2A697.1000701@linux.intel.com>
Date:	Wed, 23 Mar 2016 16:22:15 +0200
From:	Mathias Nyman <mathias.nyman@...ux.intel.com>
To:	Rajesh Bhagat <rajesh.bhagat@....com>
CC:	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Sriram Dash <sriram.dash@....com>
Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to
 XHCI commmand timeout

On 23.03.2016 05:53, Rajesh Bhagat wrote:

>>> IMO, The assumption that "xhci_abort_cmd_ring would always generate an
>>> event and handle_cmd_completion would be called" will not be always be true if HW
>> is in bad state.
>>>
>>> Please share your opinion.
>>>
>>
>> writing the CA (command abort) bit in CRCR (command ring control register)  will stop
>> the command ring, and CRR (command ring running) will be set to 0 by xHC.
>> xhci_abort_cmd_ring() polls this bit up to 5 seconds.
>> If it's not 0 then the driver considers the command abort as failed.
>>
>> The scenario you're thinking of is that xHC would still react to CA bit set, it would stop
>> the command ring and set CRR 0, but not send a command completion event.
>>
>> Have you tried adding some debug to handle_cmd_completion() and see if you receive
>> any event after command abortion?
>>
>
> Yes. We have added debug prints at first line of handle_cmd_completion, and we are not getting
> those prints. The last print messages that we get are as below from xhci_alloc_dev while resume
> operation:
>
> xhci-hcd xhci-hcd.0.auto: Command timeout
> xhci-hcd xhci-hcd.0.auto: Abort command ring
>
> May be somehow, USB controller is in bad state and not responding to the commands.
>
> Please suggest how XHCI driver can handle such situations.
>

Restart the command timeout timer when writing the command abort bit.
If we get theIf we get the abort event the timer is deleted.

Otherwise if the timout triggers a second time we end up calling
xhci_handle_command_timeout() with a stopped ring,
This will call xhci_handle_stopped_cmd_ring(), turn the aborted command to no-op, restart the
command ring, and finally when the no-op completes it should call the missing completion.

If command ring doesn't start then additional code could be added to xhci_handle_command_timeout()
that clears the command ring if it is called a second time (=current command is already in
abort state and command ring is stopped when entering xhci_handle_command_timeout)

There might be some details missing, I'm not able to test any of this, but try
something like this:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 3e1d24c..576819e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -319,7 +319,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
                 xhci_halt(xhci);
                 return -ESHUTDOWN;
         }
-
+       /* writing the CMD_RING_ABORT bit should create a command completion
+        * event, add a command completion timeout for it as well
+        */
+       mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT);
         return 0;
  }

-Mathias

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ