[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56F1350B.7000006@linux.intel.com>
Date: Tue, 22 Mar 2016 14:05:31 +0200
From: Mathias Nyman <mathias.nyman@...ux.intel.com>
To: Rajesh Bhagat <rajesh.bhagat@....com>
CC: "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Sriram Dash <sriram.dash@....com>
Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to
XHCI commmand timeout
On 22.03.2016 07:19, Rajesh Bhagat wrote:
>
>
>> -----Original Message-----
>> From: Mathias Nyman [mailto:mathias.nyman@...el.com]
>> Sent: Monday, March 21, 2016 2:46 PM
>> To: Rajesh Bhagat <rajesh.bhagat@....com>; Mathias Nyman
>> <mathias.nyman@...ux.intel.com>; linux-usb@...r.kernel.org; linux-
>> kernel@...r.kernel.org
>> Cc: gregkh@...uxfoundation.org; Sriram Dash <sriram.dash@....com>
>> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
>> commmand timeout
>>
>> On 21.03.2016 06:18, Rajesh Bhagat wrote:
>>>
>>>
>>>>
>>>> Hi
>>>>
>>>> I think clearing the whole command ring is a bit too much in this case.
>>>> It may cause issues for all attached devices when one command times out.
>>>>
>>>
>>> Hi Mathias,
>>>
>>> I understand your point, But I want to understand how would completion
>>> handler be called if a command is timed out and xhci_abort_cmd_ring is
>>> successful. In this case all the code would be waiting on completion handler forever.
>>>
>>>
>>> 2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) ->
>>> xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd
>>>
>>> In our case command is timed out, Hence we hit the case #2 but
>>> xhci_abort_cmd_ring is success which does not calls complete.
>>
>> xhci_abort_cmd_ring() will write CA bit (CMD_RING_ABORT) to CRCR register.
>> This will generate a command completion event with status "command aborted" for
>> the pending command.
>> This event is then followed by a "command ring stopped" command completion event.
>>
>> See xHCI specs 5.4.5 and 4.6.1.2
>>
>> handle_cmd_completion() will check if cmd_comp_code == COMP_CMD_ABORT, goto
>> event_handled, and call xhci_complete_del_and_free_cmd(cmd, cmd_comp_code) for
>> the aborted command.
>>
>> If xHCI already processed the aborted command, we might only get a command ring
>> stopped event, in this case handle_cmd_completion() will call
>> xhci_handle_stopped_cmd_ring(xhci, cmd), which will turn the commands that were
>> tagged for "abort" that still remain on the command ring to NO-OP commands.
>>
>> The completion callback will be called for these NO-OP command later when we get a
>> command completion event for them.
>>
>
> Thanks Mathias for detailed explanation. Now I understand how completion handler is
> supposed to be called in this scenario.
>
> But in our case, somehow we are not getting any event and handle_cmd_completion function
> is not getting called even after successful xhci_abort_cmd_ring when command timed out.
>
> Now, my point here is code prior to this patch xhci: rework command timeout and cancellation,
> Code would have returned in case command timed out in xhci_alloc_dev itself.
>
> - /* XXX: how much time for xHC slot assignment? */
> - timeleft = wait_for_completion_interruptible_timeout(
> - command->completion,
> - XHCI_CMD_DEFAULT_TIMEOUT);
> - if (timeleft <= 0) {
> - xhci_warn(xhci, "%s while waiting for a slot\n",
> - timeleft == 0 ? "Timeout" : "Signal");
> - /* cancel the enable slot request */
> - ret = xhci_cancel_cmd(xhci, NULL, command->command_trb);
> - return ret;
> - }
> + wait_for_completion(command->completion);
>
> But after this patch, we are waiting for hardware event, which is somehow not generated
> and causing a hang scenario.
>
> IMO, The assumption that "xhci_abort_cmd_ring would always generate an event
> and handle_cmd_completion would be called" will not be always be true if HW is in bad state.
>
> Please share your opinion.
>
writing the CA (command abort) bit in CRCR (command ring control register) will stop the command ring,
and CRR (command ring running) will be set to 0 by xHC. xhci_abort_cmd_ring() polls this bit up to 5 seconds.
If it's not 0 then the driver considers the command abort as failed.
The scenario you're thinking of is that xHC would still react to CA bit set, it would stop the command
ring and set CRR 0, but not send a command completion event.
Have you tried adding some debug to handle_cmd_completion() and see if you receive any event after command abortion?
The patches I mentioned earlier are also worth trying out, they might fix the cause why command aborts in the first place.
-Mathias
Powered by blists - more mailing lists