lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <585A7A0A.6090401@linux.intel.com>
Date:   Wed, 21 Dec 2016 14:48:10 +0200
From:   Mathias Nyman <mathias.nyman@...ux.intel.com>
To:     Lu Baolu <baolu.lu@...ux.intel.com>,
        Baolin Wang <baolin.wang@...aro.org>
Cc:     Mathias Nyman <mathias.nyman@...el.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        USB <linux-usb@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mark Brown <broonie@...nel.org>,
        "Lu, Baolu" <baolu.lu@...el.com>,
        Alan Stern <stern@...land.harvard.edu>,
        OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

On 21.12.2016 08:17, Lu Baolu wrote:
> Hi Mathias,
>
> I have some comments for the implementation of xhci_abort_cmd_ring() below.
>
> On 12/20/2016 11:13 PM, Mathias Nyman wrote:
>> On 20.12.2016 09:30, Baolin Wang wrote:
>> ...
>>
>> Alright, I gathered all current work related to xhci races and timeouts
>> and put them into a branch:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git timeout_race_fixes
>>
>> Its based on 4.9
>> It includes a few other patches just to avoid conflicts and  make my life easier
>>
>> Interesting patches are:
>>
>> ee4eb91 xhci: remove unnecessary check for pending timer
>> 0cba67d xhci: detect stop endpoint race using pending timer instead of counter.
>> 4f2535f xhci: Handle command completion and timeout race
>> b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort command
>> 529a5a0 usb: xhci: fix possible wild pointer
>> 4766555 xhci: Fix race related to abort operation
>> de834a3 xhci: Use delayed_work instead of timer for command timeout
>> 69973b8 Linux 4.9
>>
>> The fixes for command queue races will go to usb-linus and stable, the
>> reworks for stop ep watchdog timer will go to usb-next.
>>
>> Still completely untested, (well it compiles)
>>
>> Felipe gave instructions how to modify dwc3 driver to timeout on address
>> devicecommands to test these, I'll try to set that up.
>>
>> All additional testing is welcome, especially if you can trigger timeouts
>> and races
>>
>> -Mathias
>>
>>
>
> Below is the latest code. I put my comments in line.
>
>   322 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>   323 {
>   324         u64 temp_64;
>   325         int ret;
>   326
>   327         xhci_dbg(xhci, "Abort command ring\n");
>   328
>   329         reinit_completion(&xhci->cmd_ring_stop_completion);
>   330
>   331         temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>   332         xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
>   333                         &xhci->op_regs->cmd_ring);
>
> We should hold xhci->lock when we are modifying xhci registers
> at runtime.
>

Makes sense, but we need to unlock it before sleeping or waiting for completion.
I need to look into that in more detail.

But this was an issue already before these changes.

> The retry of setting CMD_RING_ABORT is not necessary according to
> previous discussion. We have cleaned code for second try in
> xhci_handle_command_timeout(). Need to clean up here as well.
>

Yes it can be cleaned up as well, but the two cases are a bit different.
The cleaned up one was about command ring not starting again after it was stopped.

This second try is a workaround for what we thought was the command ring failing
to stop in the first place, but is most likely due to the race that OGAWA Hirofumi
fixed.  It races if the stop command ring interrupt happens between writing the abort
bit and polling for the ring stopped bit. The interrupt hander may start the command
ring again, and we would believe we failed to stop it in the first place.

This race could probably be fixed by just extending the lock (and preventing
interrupts) to cover both writing the abort bit and polling for the command ring
running bit, as you pointed out here previously.

But then again I really like OGAWA Hiroumi's solution that separates the
command ring stopping from aborting commands and restarting the ring.

The current way of always restarting the command ring as a response to
a stop command ring command really limits its usage.

So, with this in mind most reasonable would be to
1. fix the lock to cover abort+CRR check, and send it to usb-linus +stable
2. rebase OGAWA Hirofumi's changes on top of that, and send to usb-linus only
3. remove unnecessary second abort try as a separate patch, send to usb-next
4. remove polling for the Command ring running (CRR), waiting for completion
    is enough, if completion times out then we can check CRR. for usb-next
    
I'll fix the typos these patches would introduce. Fixing old typos can be done as separate
patches later.

-Mathias

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ