lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AE90C24D6B3A694183C094C60CF0A2F6026B73AF@saturn3.aculab.com>
Date:	Thu, 24 Oct 2013 16:05:28 +0100
From:	"David Laight" <David.Laight@...LAB.COM>
To:	"Sarah Sharp" <sarah.a.sharp@...ux.intel.com>
Cc:	<netdev@...r.kernel.org>, <linux-usb@...r.kernel.org>,
	"Xenia Ragiadakou" <burzalodowa@...il.com>
Subject: RE: transmit lockup using smsc95xx ethernet on usb3

> Have you tried the latest stable kernel or the latest -rc kernel?

I've built a kernel based on Linus's tree from last Friday, 3.12-rc6 ish.
Commented out the trace for short reads - happens all the time.

I've not seen an error on a Bo yet, the failure rate is depressingly low.
I have had an error that started with an interrupt completion (I think
from the root), that leads to a full unplug-replug sequence that
overfilled the kernel message buffer (rebuilt with a much bigger buffer).

I've just got a error -71 from a Bi. This generates:
(There are actually two separated by a Bo completion and setup.

[175908.563068] xhci_hcd 0000:00:14.0: Transfer error on endpoint
[175908.563070] xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[175908.563072] xhci_hcd 0000:00:14.0: Finding segment containing stopped TRB.
[175908.563074] xhci_hcd 0000:00:14.0: Finding endpoint context
[175908.563076] xhci_hcd 0000:00:14.0: Finding segment containing last TRB in TD.
[175908.563078] xhci_hcd 0000:00:14.0: Cycle state = 0x1
[175908.563081] xhci_hcd 0000:00:14.0: New dequeue segment = ffff8800d6a817e0 (virtual)
[175908.563089] xhci_hcd 0000:00:14.0: New dequeue pointer = 0x2137a4b90 (DMA)
[175908.563096] xhci_hcd 0000:00:14.0: Queueing new dequeue state
[175908.563104] xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = ffff8800d6a817e0 (0x2137a4800 dma), new deq ptr = ffff8802137a4b90 (0x2137a4b90 dma), new cycle = 1
[175908.563110] xhci_hcd 0000:00:14.0: // Ding dong!
[175908.563119] xhci_hcd 0000:00:14.0: Giveback URB ffff8802114fd9c0, len = 0, expected = 18944, status = -71
[175908.563122] xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[175908.563125] xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @2137a4b91

What is interesting is that the usbmon trace then shows only 2 Bi URB being used
(rather than 4). After 125ms (assuming the usbmon timestamps are us)
another 2 URB are added.
Since the URB are usually recycled (or at least freed and immediately realloced
so getting the same address) I wonder if this is a memory leak?
In any case waiting that long before adding the URB back doesn't seem right.

I don't think I've seen an error that only affected the Bi side before, so
don't know whether the recover behaviour has changed.

FWIW we have identified something 'sub-optimal' with the pcb layout that might
be responsible for noise on the USB data/clock source. However the xhci driver
should recover from such errors.
If the hw guys fix the pcb I'm going to need to keep a failing system!

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ