lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 22 Mar 2016 14:05:31 +0200
From:	Mathias Nyman <mathias.nyman@...ux.intel.com>
To:	Rajesh Bhagat <rajesh.bhagat@....com>
CC:	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Sriram Dash <sriram.dash@....com>
Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to
 XHCI commmand timeout

On 22.03.2016 07:19, Rajesh Bhagat wrote:
>
>
>> -----Original Message-----
>> From: Mathias Nyman [mailto:mathias.nyman@...el.com]
>> Sent: Monday, March 21, 2016 2:46 PM
>> To: Rajesh Bhagat <rajesh.bhagat@....com>; Mathias Nyman
>> <mathias.nyman@...ux.intel.com>; linux-usb@...r.kernel.org; linux-
>> kernel@...r.kernel.org
>> Cc: gregkh@...uxfoundation.org; Sriram Dash <sriram.dash@....com>
>> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
>> commmand timeout
>>
>> On 21.03.2016 06:18, Rajesh Bhagat wrote:
>>>
>>>
>>>>
>>>> Hi
>>>>
>>>> I think clearing the whole command ring is a bit too much in this case.
>>>> It may cause issues for all attached devices when one command times out.
>>>>
>>>
>>> Hi Mathias,
>>>
>>> I understand your point, But I want to understand how would completion
>>> handler be called if a command is timed out and xhci_abort_cmd_ring is
>>> successful. In this case all the code would be waiting on completion handler forever.
>>>
>>>
>>> 2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) ->
>>> xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd
>>>
>>> In our case command is timed out, Hence we hit the case #2 but
>>> xhci_abort_cmd_ring is success which does not calls complete.
>>
>> xhci_abort_cmd_ring() will write CA bit (CMD_RING_ABORT) to CRCR register.
>> This will generate a command completion event with status "command aborted" for
>> the pending command.
>> This event is then followed by a "command ring stopped" command completion event.
>>
>> See xHCI specs 5.4.5 and 4.6.1.2
>>
>> handle_cmd_completion() will check if cmd_comp_code == COMP_CMD_ABORT, goto
>> event_handled, and call xhci_complete_del_and_free_cmd(cmd, cmd_comp_code) for
>> the aborted command.
>>
>> If xHCI already processed the aborted command, we might only get a command ring
>> stopped event, in this case handle_cmd_completion() will call
>> xhci_handle_stopped_cmd_ring(xhci, cmd), which will turn the commands that were
>> tagged for "abort" that still remain on the command ring to NO-OP commands.
>>
>> The completion callback will be called for these NO-OP command later when we get a
>> command completion event for them.
>>
>
> Thanks Mathias for detailed explanation. Now I understand how completion handler is
> supposed to be called in this scenario.
>
> But in our case, somehow we are not getting any event and handle_cmd_completion function
> is not getting called even after successful xhci_abort_cmd_ring when command timed out.
>
> Now, my point here is code prior to this patch xhci: rework command timeout and cancellation,
> Code would have returned in case command timed out in xhci_alloc_dev itself.
>
> -       /* XXX: how much time for xHC slot assignment? */
> -       timeleft = wait_for_completion_interruptible_timeout(
> -                       command->completion,
> -                       XHCI_CMD_DEFAULT_TIMEOUT);
> -       if (timeleft <= 0) {
> -               xhci_warn(xhci, "%s while waiting for a slot\n",
> -                               timeleft == 0 ? "Timeout" : "Signal");
> -               /* cancel the enable slot request */
> -               ret = xhci_cancel_cmd(xhci, NULL, command->command_trb);
> -               return ret;
> -       }
> +       wait_for_completion(command->completion);
>
> But after this patch, we are waiting for hardware event, which is somehow not generated
> and causing a hang scenario.
>
> IMO, The assumption that "xhci_abort_cmd_ring would always generate an event
> and handle_cmd_completion would be called" will not be always be true if HW is in bad state.
>
> Please share your opinion.
>

writing the CA (command abort) bit in CRCR (command ring control register)  will stop the command ring,
and CRR (command ring running) will be set to 0 by xHC. xhci_abort_cmd_ring() polls this bit up to 5 seconds.
If it's not 0 then the driver considers the command abort as failed.

The scenario you're thinking of is that xHC would still react to CA bit set, it would stop the command
ring and set CRR 0, but not send a command completion event.

Have you tried adding some debug to handle_cmd_completion() and see if you receive any event after command abortion?

The patches I mentioned earlier are also worth trying out, they might fix the cause why command aborts in the first place.

-Mathias



    


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ