lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <HE1PR0401MB202857F4B183B84FCEDFBDA6E38F0@HE1PR0401MB2028.eurprd04.prod.outlook.com>
Date:	Mon, 21 Mar 2016 04:18:17 +0000
From:	Rajesh Bhagat <rajesh.bhagat@....com>
To:	Mathias Nyman <mathias.nyman@...ux.intel.com>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"mathias.nyman@...el.com" <mathias.nyman@...el.com>,
	Sriram Dash <sriram.dash@....com>
Subject: RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
 commmand timeout



> -----Original Message-----
> From: Mathias Nyman [mailto:mathias.nyman@...ux.intel.com]
> Sent: Friday, March 18, 2016 4:51 PM
> To: Rajesh Bhagat <rajesh.bhagat@....com>; linux-usb@...r.kernel.org; linux-
> kernel@...r.kernel.org
> Cc: gregkh@...uxfoundation.org; mathias.nyman@...el.com; Sriram Dash
> <sriram.dash@....com>
> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
> commmand timeout
> 
> On 18.03.2016 09:01, Rajesh Bhagat wrote:
> > We are facing issue while performing the system resume operation from
> > STR where XHCI is going to indefinite hang/sleep state due to
> > wait_for_completion API called in function xhci_alloc_dev for command
> > TRB_ENABLE_SLOT which never completes.
> >
> > Now, xhci_handle_command_timeout function is called and prints
> > "Command timeout" message but never calls complete API for above
> > TRB_ENABLE_SLOT command as xhci_abort_cmd_ring is successful.
> >
> > Solution to above problem is:
> > 1. calling xhci_cleanup_command_queue API even if xhci_abort_cmd_ring
> >     is successful or not.
> > 2. checking the status of reset_device in usb core code.
> 
> 
> Hi
> 
> I think clearing the whole command ring is a bit too much in this case.
> It may cause issues for all attached devices when one command times out.
> 


Hi Mathias, 

I understand your point, But I want to understand how would completion handler be called 
if a command is timed out and xhci_abort_cmd_ring is successful. In this case all the code 
would be waiting on completion handler forever. 
	

> We need to look in more detail why we fail to call completion for that one aborted
> command.
> 

I checked the below code, Please correct me if I am wrong

code waiting on wait_for_completion: 
int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev)
{
...
        ret = xhci_queue_slot_control(xhci, command, TRB_ENABLE_SLOT, 0);
...

        wait_for_completion(command->completion); <=== waiting for command to complete 


code calling completion handler:
1. handle_cmd_completion -> xhci_complete_del_and_free_cmd
2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) -> xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd

In our case command is timed out, Hence we hit the case #2 but xhci_abort_cmd_ring is success which 
does not calls complete. 


> The bigger question is why the timeout happens in the first place?
> 

We are doing suspend resume operation, It might be controller issue :(, IMO software should not 
hang/stop if hardware is not behaving correct. 

> What kernel version, and what xhci vendor was this triggered on?
> 

We are using 4.1.8 kernel

> It's possible that the timeout is related either to the locking issue found by Chris
> Bainbridge:
> http://marc.info/?l=linux-usb&m=145493945408601&w=2
> 
> or the resume issues in this thread, (see full thread)
> http://marc.info/?l=linux-usb&m=145477850706552&w=2
> 
> Does any of those proposed solutions fix the command timeout for you?
> 

I will check the above patches and share status.

> -Mathias

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ