lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140107003157.GA7963@xanatos>
Date:	Mon, 6 Jan 2014 16:31:57 -0800
From:	Sarah Sharp <sarah.a.sharp@...ux.intel.com>
To:	walt <w41ter@...il.com>
Cc:	Alan Stern <stern@...land.harvard.edu>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	stable@...r.kernel.org, David Laight <david.laight@...lab.com>,
	linux-usb@...r.kernel.org, linux-scsi@...r.kernel.org
Subject: Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a
 USB payload burst

On Fri, Jan 03, 2014 at 03:29:29PM -0800, Sarah Sharp wrote:
> On Fri, Jan 03, 2014 at 01:21:18PM -0800, walt wrote:
> > I'm so sorry Sarah, that was another mistake.  The mistake is so stupid I'm not
> > going to publish it here :(
> > 
> > Once I finally ran the kernel with debugging actually compiled in, dmesg contains
> > xhci debugging messages.  Wow :)
> > 
> > It's a big file so I zipped and attached it, which I hope is acceptable in lkml.
> 
> Yep, that's fine.  Sticking it in pastebin (or up on your server) is
> also fine, if it gets really big.
> 
> > BTW, this dmesg is from a kernel with sg_tablesize = 31, which as I said before
> > doesn't fix the problem.  The cp stopped around 7GB just as before.
> > 
> > Sorry for the noise...
> 
> No worries! :)  With the dmesg, I can finally see what happened:
> 
> [  188.703059] xhci_hcd 0000:03:00.0: Cancel URB ffff8800b7d2e0c0, dev 1, ep 0x2, starting at offset 0xbb7b9000
> [  188.703072] xhci_hcd 0000:03:00.0: // Ding dong!
> [  193.711022] xhci_hcd 0000:03:00.0: xHCI host not responding to stop endpoint command.
> [  193.711029] xhci_hcd 0000:03:00.0: Assuming host is dying, halting host.
> [  193.711046] xhci_hcd 0000:03:00.0: // Halt the HC
> [  193.711060] xhci_hcd 0000:03:00.0: Killing URBs for slot ID 1, ep index 0
> [  193.711066] xhci_hcd 0000:03:00.0: Killing URBs for slot ID 1, ep index 2
> [  193.711078] xhci_hcd 0000:03:00.0: Killing URBs for slot ID 1, ep index 3
> [  193.711096] xhci_hcd 0000:03:00.0: Calling usb_hc_died()
> [  193.711103] xhci_hcd 0000:03:00.0: HC died; cleaning up
> [  193.711116] xhci_hcd 0000:03:00.0: xHCI host controller is dead.
> 
> It seems that the xHCI driver tried to stop the endpoint ring in order
> to cancel a SCSI transfer, and the driver never got a response for that.
> 
> The offset is rather suspicious (0xbb7b9000), and it probably means the
> driver attempted to cancel a transfer that had been moved to the
> beginning of the ring segment, with no-op TRBs before the link TRB.
> 
> I suspect David's patch triggers a bug in the command cancellation code.
> There's also the unlikely possibility that the no-op TRBs did indeed
> cause the host to hang.  Either way, I'll have to look into it.
> 
> I'll let you know when I have some diagnostic patches ready.

Hi Walt,

I have a couple of patches for you to test.  You can either apply the
attached three patches, or you can pull down a kernel with:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci.git -b 3.12-td-fragment-failure

Please only apply the first patch (which is diagnostic only), trigger
your issue, and send me the resulting dmesg.  Then try applying the
other two patches, and see if the issue goes away.  (I suspect it won't
but I can't be sure.)

Sarah Sharp

View attachment "0001-TD-fragment-debugging.patch" of type "text/x-diff" (1894 bytes)

View attachment "0002-xhci-Avoid-infinite-loop-when-sg-urb-requires-too-ma.patch" of type "text/x-diff" (1577 bytes)

View attachment "0003-xhci-Set-scatter-gather-limit-to-avoid-failed-block-.patch" of type "text/x-diff" (2791 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ