lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Dec 2016 14:57:35 +0800
From:   Lu Baolu <baolu.lu@...ux.intel.com>
To:     Mathias Nyman <mathias.nyman@...ux.intel.com>,
        Baolin Wang <baolin.wang@...aro.org>
Cc:     Mathias Nyman <mathias.nyman@...el.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        USB <linux-usb@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mark Brown <broonie@...nel.org>,
        "Lu, Baolu" <baolu.lu@...el.com>
Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

Hi Mathias,

I have some comments for the implementation of
xhci_handle_command_timeout() as well.

On 12/20/2016 11:13 PM, Mathias Nyman wrote:
> On 20.12.2016 09:30, Baolin Wang wrote:
> ...
>
> Alright, I gathered all current work related to xhci races and timeouts
> and put them into a branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git timeout_race_fixes
>
> Its based on 4.9
> It includes a few other patches just to avoid conflicts and  make my life easier
>
> Interesting patches are:
>
> ee4eb91 xhci: remove unnecessary check for pending timer
> 0cba67d xhci: detect stop endpoint race using pending timer instead of counter.
> 4f2535f xhci: Handle command completion and timeout race
> b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort command
> 529a5a0 usb: xhci: fix possible wild pointer
> 4766555 xhci: Fix race related to abort operation
> de834a3 xhci: Use delayed_work instead of timer for command timeout
> 69973b8 Linux 4.9
>
> The fixes for command queue races will go to usb-linus and stable, the
> reworks for stop ep watchdog timer will go to usb-next.
>
> Still completely untested, (well it compiles)
>
> Felipe gave instructions how to modify dwc3 driver to timeout on address
> devicecommands to test these, I'll try to set that up.
>
> All additional testing is welcome, especially if you can trigger timeouts
> and races
>
> -Mathias
>
>

I post the code below and add my comments in line.

1276 void xhci_handle_command_timeout(struct work_struct *work)
1277 {
1278         struct xhci_hcd *xhci;
1279         int ret;
1280         unsigned long flags;
1281         u64 hw_ring_state;
1282
1283         xhci = container_of(to_delayed_work(work), struct xhci_hcd, cmd_timer);
1284
1285         spin_lock_irqsave(&xhci->lock, flags);
1286
1287         /*
1288          * If timeout work is pending, or current_cmd is NULL, it means we
1289          * raced with command completion. Command is handled so just return.
1290          */
1291         if (!xhci->current_cmd || delayed_work_pending(&xhci->cmd_timer)) {
1292                 spin_unlock_irqrestore(&xhci->lock, flags);
1293                 return;
1294         }
1295         /* mark this command to be cancelled */
1296         xhci->current_cmd->status = COMP_CMD_ABORT;
1297
1298         /* Make sure command ring is running before aborting it */
1299         hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
1300         if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
1301             (hw_ring_state & CMD_RING_RUNNING))  {
1302                 /* Prevent new doorbell, and start command abort */
1303                 xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
1304                 spin_unlock_irqrestore(&xhci->lock, flags);
1305                 xhci_dbg(xhci, "Command timeout\n");
1306                 ret = xhci_abort_cmd_ring(xhci);
1307                 if (unlikely(ret == -ESHUTDOWN)) {
1308                         xhci_err(xhci, "Abort command ring failed\n");
1309                         xhci_cleanup_command_queue(xhci);
1310                         usb_hc_died(xhci_to_hcd(xhci)->primary_hcd);
1311                         xhci_dbg(xhci, "xHCI host controller is dead.\n");
1312                 }
1313                 return;
1314         }
1315
1316         /* host removed. Bail out */
1317         if (xhci->xhc_state & XHCI_STATE_REMOVING) {
1318                 spin_unlock_irqrestore(&xhci->lock, flags);
1319                 xhci_dbg(xhci, "host removed, ring start fail?\n");
1320                 xhci_cleanup_command_queue(xhci);
1321                 return;
1322         }

I think this part of code should be moved up to line 1295.

1323
1324         /* command timeout on stopped ring, ring can't be aborted */
1325         xhci_dbg(xhci, "Command timeout on stopped ring\n");
1326         xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
1327         spin_unlock_irqrestore(&xhci->lock, flags);

This part of code is tricky. I have no idea about in which case should this
code be executed? Anyway, we shouldn't call xhci_handle_stopped_cmd_ring()
here, right?

1328         return;
1329 }

Best regards,
Lu Baolu

Powered by blists - more mailing lists