lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140109235045.GA17660@xanatos>
Date:	Thu, 9 Jan 2014 15:50:45 -0800
From:	Sarah Sharp <sarah.a.sharp@...ux.intel.com>
To:	walt <w41ter@...il.com>, David Laight <David.Laight@...LAB.COM>
Cc:	Alan Stern <stern@...land.harvard.edu>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	"stable@...r.kernel.org" <stable@...r.kernel.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>
Subject: Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a
 USB payload burst

[Walt, please use reply-all to keep the list in the loop, thanks.]

On Wed, Jan 08, 2014 at 04:09:14PM +0000, David Laight wrote:
> > From: Sarah Sharp
> > On Tue, Jan 07, 2014 at 03:57:00PM -0800, walt wrote:
> > > On 01/07/2014 01:21 PM, Sarah Sharp wrote:
> > >
> > > > Can you please try the attached patch, on top of the previous three
> > > > patches, and send me dmesg?
> > >
> > > Hi Sarah, I just now finished running 0001-More-debugging.patch for the
> > > first time.  The previous dmesg didn't include that patch, but this one
> > > does.
> > >
> > > I read through this dmesg but I nodded off somewhere around line 500.
> > > I hope you can stay awake :)
> > 
> > Well, it has all the info I need, but the results don't make me too
> > happy.  Everything I've checked seems consistent, and I don't know why
> > the host stopped.  The link TRBs are intact, the dequeue pointer for the
> > endpoint was pointing to the transfer that timed out and it had the
> > cycle bit set correctly, etc.  Perhaps the no-op TRBs are really the
> > issue.
> > 
> > I'll have to take a look at the log again tomorrow.  I posted the dmesg
> > on pastebin if David wants to check it out as well:
> > http://pastebin.com/a4AUpsL1
> 
> I can't see anything obvious either.
> However there is no response to the 'stop endpoint' command.
> Section 4.6.9 (page 107 of rev1.0) states that the controller will complete
> any USB IN or OUT transaction before raising the command completion event.
> Possibly it is too 'stuck' to complete the transaction?

The host has to stop processing the transaction, it can't "wait" for the
transaction to finish.  "The Stop Endpoint Command is expected to stop
endpoint activity as soon as possible, which may mean that it stops in
the middle of a TRB."

Usually when hosts get into this kind of mode, something has seriously
gone wrong, like bus error when it issues a bad memory access.

> The endpoint status is also still '1' (running).
> This also means that the 'TR dequeue pointer' is undefined - so the
> controller could easily be processing a later TRB.
> This field might even still contain the ring base address written by
> the driver much earlier.
> 
> This might mean that something 'catastrophic' has happened earlier.
> Maybe the controller isn't actually seeing any doorbell writes at all.
> Maybe the base addresses it has for the rings have all got corrupted.
> At least this looks like amd64 - so there aren't memory coherency issues.
> 
> Some hacks that might help isolate the problem:
> 1) Request an interrupt from the last nop data TRB.
> 2) Put a command nop (decimal 23) TRB into the command ring before
>    the 'stop endpoint'.
> 3) Comment out the code that adds the nop data TRBs.
> The first two might need code adding to handle the responses.
> 
> Do we know the actual xhci device?
> I think it reports version 0x96.
> (Sarah - it might be useful if that version were in one of the trace
> messages that is output by default.)

You mean print the PCI device and vendor ID?  Perhaps Subsystem vendor
as well?

On Tue, Jan 07, 2014 at 05:26:37PM -0800, walt wrote:
> On 01/07/2014 04:47 PM, Sarah Sharp wrote:
>  
> > Can you send me the output of `sudo lspci -vvv -n`?  Maybe we can just
> > turn off scatter-gather for your host controller until we get a proper
> > fix in that uses link TRBs instead of no-op TRBs.
> 
> The aftermarket usb3 adapter card and the usb3 outboard hard-drive docking
> station are the only two usb3 devices I have.
> 
> I've wondered if my xhci problems might be caused by hardware quirks, and
> wondering why I seem to be the only one who has this problem.
> 
> Maybe I could "take one for the team" by buying new hardware toys that I
> don't really need but I could use to test the xhci driver?  (I do enjoy
> buying new toys, necessary, or, um, maybe not :)

It would be appreciated if you could see if your device causes other
host controllers to fail.  Who am I to keep a geek from new toys? ;)

In the meantime, try this patch, which is something of a long shot.

Sarah Sharp

View attachment "0001-xhci-Enable-Link-TRB-quirk-for-0.96-ASMedia-host.patch" of type "text/x-diff" (1079 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ