lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 22 Mar 2016 05:19:52 +0000
From:	Rajesh Bhagat <rajesh.bhagat@....com>
To:	Mathias Nyman <mathias.nyman@...ux.intel.com>
CC:	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Sriram Dash <sriram.dash@....com>
Subject: RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
 commmand timeout



> -----Original Message-----
> From: Mathias Nyman [mailto:mathias.nyman@...el.com]
> Sent: Monday, March 21, 2016 2:46 PM
> To: Rajesh Bhagat <rajesh.bhagat@....com>; Mathias Nyman
> <mathias.nyman@...ux.intel.com>; linux-usb@...r.kernel.org; linux-
> kernel@...r.kernel.org
> Cc: gregkh@...uxfoundation.org; Sriram Dash <sriram.dash@....com>
> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
> commmand timeout
> 
> On 21.03.2016 06:18, Rajesh Bhagat wrote:
> >
> >
> >>
> >> Hi
> >>
> >> I think clearing the whole command ring is a bit too much in this case.
> >> It may cause issues for all attached devices when one command times out.
> >>
> >
> > Hi Mathias,
> >
> > I understand your point, But I want to understand how would completion
> > handler be called if a command is timed out and xhci_abort_cmd_ring is
> > successful. In this case all the code would be waiting on completion handler forever.
> >
> >
> > 2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) ->
> > xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd
> >
> > In our case command is timed out, Hence we hit the case #2 but
> > xhci_abort_cmd_ring is success which does not calls complete.
> 
> xhci_abort_cmd_ring() will write CA bit (CMD_RING_ABORT) to CRCR register.
> This will generate a command completion event with status "command aborted" for
> the pending command.
> This event is then followed by a "command ring stopped" command completion event.
> 
> See xHCI specs 5.4.5 and 4.6.1.2
> 
> handle_cmd_completion() will check if cmd_comp_code == COMP_CMD_ABORT, goto
> event_handled, and call xhci_complete_del_and_free_cmd(cmd, cmd_comp_code) for
> the aborted command.
> 
> If xHCI already processed the aborted command, we might only get a command ring
> stopped event, in this case handle_cmd_completion() will call
> xhci_handle_stopped_cmd_ring(xhci, cmd), which will turn the commands that were
> tagged for "abort" that still remain on the command ring to NO-OP commands.
> 
> The completion callback will be called for these NO-OP command later when we get a
> command completion event for them.
> 

Thanks Mathias for detailed explanation. Now I understand how completion handler is 
supposed to be called in this scenario. 

But in our case, somehow we are not getting any event and handle_cmd_completion function 
is not getting called even after successful xhci_abort_cmd_ring when command timed out. 

Now, my point here is code prior to this patch xhci: rework command timeout and cancellation,
Code would have returned in case command timed out in xhci_alloc_dev itself.

-       /* XXX: how much time for xHC slot assignment? */
-       timeleft = wait_for_completion_interruptible_timeout(
-                       command->completion,
-                       XHCI_CMD_DEFAULT_TIMEOUT);
-       if (timeleft <= 0) {
-               xhci_warn(xhci, "%s while waiting for a slot\n",
-                               timeleft == 0 ? "Timeout" : "Signal");
-               /* cancel the enable slot request */
-               ret = xhci_cancel_cmd(xhci, NULL, command->command_trb);
-               return ret;
-       }
+       wait_for_completion(command->completion);

But after this patch, we are waiting for hardware event, which is somehow not generated 
and causing a hang scenario. 

IMO, The assumption that "xhci_abort_cmd_ring would always generate an event 
and handle_cmd_completion would be called" will not be always be true if HW is in bad state.

Please share your opinion.

> >> What kernel version, and what xhci vendor was this triggered on?
> >>
> >
> > We are using 4.1.8 kernel
> >
> 
> Are you able to try a more recent version?
> 

Using a newer kernel version would be bit difficult, but I would surely try it.

> -Mathias

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ