linux-kernel - RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <063D6719AE5E284EB5DD2968C1650D6D455431@AcuExch.aculab.com>
Date:	Wed, 8 Jan 2014 16:09:14 +0000
From:	David Laight <David.Laight@...LAB.COM>
To:	'Sarah Sharp' <sarah.a.sharp@...ux.intel.com>,
	walt <w41ter@...il.com>
CC:	Alan Stern <stern@...land.harvard.edu>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	"stable@...r.kernel.org" <stable@...r.kernel.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>
Subject: RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within
 a USB payload burst

> From: Sarah Sharp
> On Tue, Jan 07, 2014 at 03:57:00PM -0800, walt wrote:
> > On 01/07/2014 01:21 PM, Sarah Sharp wrote:
> >
> > > Can you please try the attached patch, on top of the previous three
> > > patches, and send me dmesg?
> >
> > Hi Sarah, I just now finished running 0001-More-debugging.patch for the
> > first time.  The previous dmesg didn't include that patch, but this one
> > does.
> >
> > I read through this dmesg but I nodded off somewhere around line 500.
> > I hope you can stay awake :)
> 
> Well, it has all the info I need, but the results don't make me too
> happy.  Everything I've checked seems consistent, and I don't know why
> the host stopped.  The link TRBs are intact, the dequeue pointer for the
> endpoint was pointing to the transfer that timed out and it had the
> cycle bit set correctly, etc.  Perhaps the no-op TRBs are really the
> issue.
> 
> I'll have to take a look at the log again tomorrow.  I posted the dmesg
> on pastebin if David wants to check it out as well:
> http://pastebin.com/a4AUpsL1

I can't see anything obvious either.
However there is no response to the 'stop endpoint' command.
Section 4.6.9 (page 107 of rev1.0) states that the controller will complete
any USB IN or OUT transaction before raising the command completion event.
Possibly it is too 'stuck' to complete the transaction?

The endpoint status is also still '1' (running).
This also means that the 'TR dequeue pointer' is undefined - so the
controller could easily be processing a later TRB.
This field might even still contain the ring base address written by
the driver much earlier.

This might mean that something 'catastrophic' has happened earlier.
Maybe the controller isn't actually seeing any doorbell writes at all.
Maybe the base addresses it has for the rings have all got corrupted.
At least this looks like amd64 - so there aren't memory coherency issues.

Some hacks that might help isolate the problem:
1) Request an interrupt from the last nop data TRB.
2) Put a command nop (decimal 23) TRB into the command ring before
   the 'stop endpoint'.
3) Comment out the code that adds the nop data TRBs.
The first two might need code adding to handle the responses.

Do we know the actual xhci device?
I think it reports version 0x96.
(Sarah - it might be useful if that version were in one of the trace
messages that is output by default.)

	David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/