lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 28 Mar 2016 06:13:12 +0000
From:	Rajesh Bhagat <rajesh.bhagat@....com>
To:	Mathias Nyman <mathias.nyman@...ux.intel.com>
CC:	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Sriram Dash <sriram.dash@....com>
Subject: RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
 commmand timeout



> -----Original Message-----
> From: Mathias Nyman [mailto:mathias.nyman@...ux.intel.com]
> Sent: Wednesday, March 23, 2016 7:52 PM
> To: Rajesh Bhagat <rajesh.bhagat@....com>
> Cc: gregkh@...uxfoundation.org; linux-usb@...r.kernel.org; linux-
> kernel@...r.kernel.org; Sriram Dash <sriram.dash@....com>
> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
> commmand timeout
> 
> On 23.03.2016 05:53, Rajesh Bhagat wrote:
> 
> >>> IMO, The assumption that "xhci_abort_cmd_ring would always generate
> >>> an event and handle_cmd_completion would be called" will not be
> >>> always be true if HW
> >> is in bad state.
> >>>
> >>> Please share your opinion.
> >>>
> >>
> >> writing the CA (command abort) bit in CRCR (command ring control
> >> register)  will stop the command ring, and CRR (command ring running) will be set
> to 0 by xHC.
> >> xhci_abort_cmd_ring() polls this bit up to 5 seconds.
> >> If it's not 0 then the driver considers the command abort as failed.
> >>
> >> The scenario you're thinking of is that xHC would still react to CA
> >> bit set, it would stop the command ring and set CRR 0, but not send a command
> completion event.
> >>
> >> Have you tried adding some debug to handle_cmd_completion() and see
> >> if you receive any event after command abortion?
> >>
> >
> > Yes. We have added debug prints at first line of
> > handle_cmd_completion, and we are not getting those prints. The last
> > print messages that we get are as below from xhci_alloc_dev while
> > resume
> > operation:
> >
> > xhci-hcd xhci-hcd.0.auto: Command timeout xhci-hcd xhci-hcd.0.auto:
> > Abort command ring
> >
> > May be somehow, USB controller is in bad state and not responding to the
> commands.
> >
> > Please suggest how XHCI driver can handle such situations.
> >
> 
> Restart the command timeout timer when writing the command abort bit.
> If we get theIf we get the abort event the timer is deleted.
> 
> Otherwise if the timout triggers a second time we end up calling
> xhci_handle_command_timeout() with a stopped ring, This will call
> xhci_handle_stopped_cmd_ring(), turn the aborted command to no-op, restart the
> command ring, and finally when the no-op completes it should call the missing
> completion.
> 
> If command ring doesn't start then additional code could be added to
> xhci_handle_command_timeout() that clears the command ring if it is called a second
> time (=current command is already in abort state and command ring is stopped when
> entering xhci_handle_command_timeout)
> 
> There might be some details missing, I'm not able to test any of this, but try
> something like this:
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index
> 3e1d24c..576819e 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -319,7 +319,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>                  xhci_halt(xhci);
>                  return -ESHUTDOWN;
>          }
> -
> +       /* writing the CMD_RING_ABORT bit should create a command completion
> +        * event, add a command completion timeout for it as well
> +        */
> +       mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT);
>          return 0;
>   }

Hello Mathias, 

Thanks for the patch. 

After application of above patch, I'm getting following prints constantly: 

xhci-hcd xhci-hcd.0.auto: Command timeout
xhci-hcd xhci-hcd.0.auto: Abort command ring
xhci-hcd xhci-hcd.0.auto: Command timeout on stopped ring
xhci-hcd xhci-hcd.0.auto: Turn aborted command be56e000 to no-op
xhci-hcd xhci-hcd.0.auto: // Ding dong!
...
xhci-hcd xhci-hcd.0.auto: Command timeout
xhci-hcd xhci-hcd.0.auto: Abort command ring
xhci-hcd xhci-hcd.0.auto: Command timeout on stopped ring
xhci-hcd xhci-hcd.0.auto: Turn aborted command be56e000 to no-op
xhci-hcd xhci-hcd.0.auto: // Ding dong!

As expected, xhci_handle_command_timeout is called again and next time ring state
is __not__ CMD_RING_STATE_RUNNING, Hence xhci_handle_stopped_cmd_ring is called 
which turn all the aborted commands to no-ops and again makes the ring state as 
CMD_RING_STATE_RUNNING, and rings the door bell. 

But again in this case, no response from USB controller and xhci_alloc_dev is still waiting for
wait_for_completion. 

Does rescheduling the xhci->cmd_timer ensures command completion will be called. IMO, it is still 
dependent on HW response. 

Please share your opinion. 

> 
> -Mathias

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ