lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <585A1E67.4020902@linux.intel.com>
Date:   Wed, 21 Dec 2016 14:17:11 +0800
From:   Lu Baolu <baolu.lu@...ux.intel.com>
To:     Mathias Nyman <mathias.nyman@...ux.intel.com>,
        Baolin Wang <baolin.wang@...aro.org>
Cc:     Mathias Nyman <mathias.nyman@...el.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        USB <linux-usb@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mark Brown <broonie@...nel.org>,
        "Lu, Baolu" <baolu.lu@...el.com>
Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

Hi Mathias,

I have some comments for the implementation of xhci_abort_cmd_ring() below.

On 12/20/2016 11:13 PM, Mathias Nyman wrote:
> On 20.12.2016 09:30, Baolin Wang wrote:
> ...
>
> Alright, I gathered all current work related to xhci races and timeouts
> and put them into a branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git timeout_race_fixes
>
> Its based on 4.9
> It includes a few other patches just to avoid conflicts and  make my life easier
>
> Interesting patches are:
>
> ee4eb91 xhci: remove unnecessary check for pending timer
> 0cba67d xhci: detect stop endpoint race using pending timer instead of counter.
> 4f2535f xhci: Handle command completion and timeout race
> b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort command
> 529a5a0 usb: xhci: fix possible wild pointer
> 4766555 xhci: Fix race related to abort operation
> de834a3 xhci: Use delayed_work instead of timer for command timeout
> 69973b8 Linux 4.9
>
> The fixes for command queue races will go to usb-linus and stable, the
> reworks for stop ep watchdog timer will go to usb-next.
>
> Still completely untested, (well it compiles)
>
> Felipe gave instructions how to modify dwc3 driver to timeout on address
> devicecommands to test these, I'll try to set that up.
>
> All additional testing is welcome, especially if you can trigger timeouts
> and races
>
> -Mathias
>
>

Below is the latest code. I put my comments in line.

 322 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
 323 {
 324         u64 temp_64;
 325         int ret;
 326
 327         xhci_dbg(xhci, "Abort command ring\n");
 328
 329         reinit_completion(&xhci->cmd_ring_stop_completion);
 330
 331         temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
 332         xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
 333                         &xhci->op_regs->cmd_ring);

We should hold xhci->lock when we are modifying xhci registers
at runtime.


 334
 335         /* Section 4.6.1.2 of xHCI 1.0 spec says software should
 336          * time the completion od all xHCI commands, including

s/od/of/g

 337          * the Command Abort operation. If software doesn't see
 338          * CRR negated in a timely manner (e.g. longer than 5
 339          * seconds), then it should assume that the there are
 340          * larger problems with the xHC and assert HCRST.
 341          */
 342         ret = xhci_handshake(&xhci->op_regs->cmd_ring,
 343                         CMD_RING_RUNNING, 0, 5 * 1000 * 1000);
 344         if (ret < 0) {
 345                 /* we are about to kill xhci, give it one more chance */

The retry of setting CMD_RING_ABORT is not necessary according to
previous discussion. We have cleaned code for second try in
xhci_handle_command_timeout(). Need to clean up here as well.

 346                 xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
 347                               &xhci->op_regs->cmd_ring);
 348                 udelay(1000);
 349                 ret = xhci_handshake(&xhci->op_regs->cmd_ring,
 350                                      CMD_RING_RUNNING, 0, 3 * 1000 * 1000);
 351                 if (ret < 0) {
 352                         xhci_err(xhci, "Stopped the command ring failed, "
 353                                  "maybe the host is dead\n");
 354                         xhci->xhc_state |= XHCI_STATE_DYING;
 355                         xhci_halt(xhci);
 356                         return -ESHUTDOWN;
 357                 }
 358         }
 359         /*
 360          * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
 361          * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
 362          * but the completion event in never sent. Wait 2 secs (arbitrary

s/in never sent/is never sent/g

 363          * number) to handle those cases after negation of CMD_RING_RUNNING.
 364          */

This should be implemented with a quirk bit. It's not common for all host controllers.

 365         if (!wait_for_completion_timeout(&xhci->cmd_ring_stop_completion,
 366                                          2 * HZ)) {
 367                 xhci_dbg(xhci, "No stop event for abort, ring start fail?\n");
 368                 xhci_cleanup_command_queue(xhci);
 369         } else {
 370                 unsigned long flags;
 371
 372                 spin_lock_irqsave(&xhci->lock, flags);
 373                 xhci_handle_stopped_cmd_ring(xhci, xhci_next_queued_cmd(xhci));
 374                 spin_unlock_irqrestore(&xhci->lock, flags);
 375         }
 376         return 0;
 377 }

Best regards,
Lu Baolu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ