[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.44L0.1803271243570.1397-100000@iolanthe.rowland.org>
Date: Tue, 27 Mar 2018 13:06:49 -0400 (EDT)
From: Alan Stern <stern@...land.harvard.edu>
To: Jonathan Liu <net147@...il.com>
cc: Jeffy Chen <jeffy.chen@...k-chips.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Brian Norris <briannorris@...omium.org>, <mka@...omium.org>,
Douglas Anderson <dianders@...omium.org>,
AMAN DEEP <aman.deep@...sung.com>, <stable@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
<linux-usb@...r.kernel.org>
Subject: Re: [v6] usb: ohci: Proper handling of ed_rm_list to handle race
condition between usb_kill_urb() and finish_unlinks()
On Mon, 26 Mar 2018, Jonathan Liu wrote:
> > I am experiencing a USB function call hang from userspace with OCHI
> > (full speed USB device) after updating from Linux 4.14.15 to 4.14.24
> > and noticed this commit.
> >
> > Here is the Linux 4.14.24 kernel stack trace (extracted from SysRq+w
> > and amended with addr2line):
> > [<c05cbd64>] (__schedule) from [<c05cc148>] (schedule+0x50/0xb4)
> > kernel/sched/core.c:2792
> > [<c05cc148>] (schedule) from [<c042789c>]
> > (usb_kill_urb.part.3+0x78/0xa8) include/asm-generic/preempt.h:59
> > [<c042789c>] (usb_kill_urb.part.3) from [<c0433a3c>]
> > (usbdev_ioctl+0x1288/0x1cf0) drivers/usb/core/urb.c:690
> > [<c0433a3c>] (usbdev_ioctl) from [<c0217db8>]
> > (do_vfs_ioctl+0x9c/0x8ec) drivers/usb/core/devio.c:1835
> > [<c0217db8>] (do_vfs_ioctl) from [<c021863c>] (SyS_ioctl+0x34/0x5c)
> > fs/ioctl.c:47
> > [<c021863c>] (SyS_ioctl) from [<c0107820>] (ret_fast_syscall+0x0/0x54)
> > include/linux/file.h:39
> >
> > Afterwards the kernel is unresponsive to disconnect/connect of the
> > full speed USB device but I can connect/disconnect a high speed USB
> > device to the same port and communicate to it without problem since it
> > uses EHCI (OHCI is companion controller). If I try to connect the full
> > speed USB device again it is still unresponsive. The userspace
> > application is still hanging after all this.
> >
> > Could this commit be causing the issue?
It sounds like the URB gets cancelled but is never given back. But
although the commit you mentioned does affect this logic, I don't see
the problem could arise.
> There does seem to be something wrong with this commit. I did some
> more testing with Linux 4.16.0-rc6 on sun7i-a20-olinuxino-lime.
> I connected a sun7i-a20-pcduino3b running Linux 4.15.12 that has OTG
> port with gadget zero kernel module (g_zero) loaded to the USB port
> and did some testing with usbtest from tools/usb/testusb.c in kernel
> repository.
>
> I had to do the following change to avoid getting flooded with
> "usbtest 1-1:3.0: unlink retry" kernel messages after test 11 of
> testusb starts:
> --- a/drivers/usb/misc/usbtest.c
> +++ b/drivers/usb/misc/usbtest.c
> @@ -1473,7 +1473,7 @@ static int unlink1(struct usbtest_dev *dev, int
> pipe, int
> size, int async)
> * resubmission, but since we're testing unlink
> * paths, we can't.
> */
> - ERROR(dev, "unlink retry\n");
> + /*ERROR(dev, "unlink retry\n");*/
That's a bizarre message in any case. The condition under which it
gets printed is _not_ an error, just a race. I would expect it to show
up every now and then just based on random chance. But if it gets
printed over and over again then something is definitely wrong.
> If I revert the commit, I can run "./testusb -a" to completion without
> any issues with Linux 4.16.0-rc6 on sun7i-a20-olinuxino-lime.
> I don't get a kernel stall detected when testing on x86_64 but I do
> still get at least one "unlink retry" error message from kernel in
> dmesg which can be reproduced using "./testusb -a" or more frequently
> with "./testusb -a -t 11 -c 5000".
A few of those "unlink retry" messages during a test with 5000
iterations does not mean anything is wrong; don't worry about it.
It would be great if you could debug this farther. I'm not sure what
to suggest, however. Clearly we need to know what's going on in
ohci-q.c's finish_unlinks() routine; unfortunately this routine
probably gets called thousands of times before the problem shows up.
Anyway, here's a patch you can try out on the sun7i-a20-olinuxino-lime.
I have no idea whether it will help pinpoint the source of the problem.
Alan Stern
Index: usb-4.x/drivers/usb/host/ohci-hcd.c
===================================================================
--- usb-4.x.orig/drivers/usb/host/ohci-hcd.c
+++ usb-4.x/drivers/usb/host/ohci-hcd.c
@@ -327,14 +327,21 @@ static int ohci_urb_dequeue(struct usb_h
* some upcoming INTR_SF to call finish_unlinks()
*/
urb_priv = urb->hcpriv;
- if (urb_priv->ed->state == ED_OPER)
+ if (urb_priv->ed->state == ED_OPER) {
+ ohci_info(ohci, "start dequeue URB %p ed %p\n", urb,
+ urb_priv->ed);
start_ed_unlink(ohci, urb_priv->ed);
-
+ } else
+ ohci_info(ohci, "idle dequeue URB %p ed %p\n", urb,
+ urb_priv->ed);
if (ohci->rh_state != OHCI_RH_RUNNING) {
/* With HC dead, we can clean up right away */
ohci_work(ohci);
}
}
+ else
+ ohci_info(ohci, "failed dequeue URB %p\n", urb);
+
spin_unlock_irqrestore (&ohci->lock, flags);
return rc;
}
Index: usb-4.x/drivers/usb/host/ohci-q.c
===================================================================
--- usb-4.x.orig/drivers/usb/host/ohci-q.c
+++ usb-4.x/drivers/usb/host/ohci-q.c
@@ -1022,6 +1022,7 @@ skip_ed:
*last = ed->ed_next;
ed->ed_next = NULL;
modified = 0;
+ ohci_info(ohci, "scanning unlinked ed %p\n", ed);
/* unlink urbs as requested, but rescan the list after
* we call a completion since it might have unlinked
@@ -1072,6 +1073,8 @@ rescan_this:
/* if URB is done, clean up */
if (urb_priv->td_cnt >= urb_priv->length) {
modified = completed = 1;
+ ohci_info(ohci, "finish dequeue URB %p ed %p\n",
+ urb, ed);
finish_urb(ohci, urb, 0);
}
}
Powered by blists - more mailing lists