lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 3 Jan 2020 13:26:13 +0800
From:   Kai-Heng Feng <kai.heng.feng@...onical.com>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     AceLan Kao <acelan.kao@...onical.com>,
        Mathias Nyman <mathias.nyman@...el.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
        Harry Pan <harry.pan@...el.com>,
        David Heinzelmann <heinzelmann.david@...il.com>,
        Andrey Konovalov <andreyknvl@...gle.com>,
        Nicolas Saenz Julienne <nsaenzjulienne@...e.de>,
        Mathieu Malaterre <malat@...ian.org>,
        USB list <linux-usb@...r.kernel.org>,
        "Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] usb: hub: move resume delay at the head of all USB access
 functions

Hi Alan,

> On Jan 3, 2020, at 05:37, Alan Stern <stern@...land.harvard.edu> wrote:
> 
> On Wed, 25 Dec 2019, AceLan Kao wrote:
> 
>> Here[1] are the dmesg and the usbmon log from wireshark, and
>> /sys/kernel/debug/usb/usbmon/0u.
>> 
>> I verified this issue on Dell XPS 13 + Dell Salomon WD19 docking
>> station(plug-in 3 USB disk on it)
>> After s2idle 7 times, 2 usb disks lost. But from wireshark log, the
>> packets look normal, no error.
>> 
>> So, I re-do the test again and log the usbmon/0u output, but it's greek to me.
>> Hope you can help to find some clues in the logs.
>> Thanks.
>> 
>> 1. https://people.canonical.com/~acelan/bugs/usb_issue/
> 
> Maybe Mathias can help figure out what's going on.  It clearly is an 
> xHCI-related problem of some sort.

I think I figured this issue out and I'll send out a patch series soon.

> 
> I can't get much out of these logs.  For one thing, the time period
> covered by the usbmon trace is different from the time period in your
> dmesg log.  When you collect two kinds of logs for a test, it's
> important that they should be collected at the _same_ time!  Otherwise 
> they record different events, which is no use.
> 
> For another, your usbmon trace used the 0u file, but you should have
> used the 2u file instead.  All the errors you get involve devices on 
> bus 2; including other buses in the trace just makes it more confusing 
> and causes more entries to be dropped.
> 
> Also, I don't understand why your Dell Dock devices show up the way
> they do.  The SuperSpeed dock is on bus 2, and the high speed dock is
> on bus 3, which is attached to a different xHCI controller!  A hub
> (which is what a dock is) should show up twice, and both instances
> should be attached to the same controller.

I think this is how Ice Lake xHCI works, the Thunderbolt xHCI is also part of SoC on Ice Lake.

> 
> In any case, it's clear that you've got some problem involving the xHCI
> controller and the SuperSpeed Dell Dock (device 2-1).  Here's an
> excerpt from the dmesg log showing what goes wrong.  This occurs during
> a system resume; apparently the dock's connection drops while the
> system is suspended and there are errors when the system tries to
> re-activate it:
> 
> [  721.507202] usb 2-1: USB disconnect, device number 32
> [  721.507206] usb 2-1.3: USB disconnect, device number 33
> [  721.507209] usb 2-1.3.1: USB disconnect, device number 35
> [  721.540214] usb 2-1.3.3: USB disconnect, device number 36
> [  721.584518] usb 2-1.4: USB disconnect, device number 34
> [  721.647466] xhci_hcd 0000:00:0d.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> 
> That's the first problem.
> 
> [  721.919433] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 38 using xhci_hcd
> [  721.942597] usb 2-1: New USB device found, idVendor=0bda, idProduct=0487, bcdDevice= 1.47
> [  721.942600] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [  721.942602] usb 2-1: Product: Dell dock
> [  721.942604] usb 2-1: Manufacturer: Dell Inc.
> [  721.944849] hub 2-1:1.0: USB hub found
> [  721.945232] hub 2-1:1.0: 4 ports detected
> [  722.178395] hub 2-1:1.0: hub_ext_port_status failed (err = -71)
> [  722.178570] usb 2-1-port3: attempt power cycle
> [  722.559212] usb 2-1: USB disconnect, device number 38
> [  722.559311] usb 2-1: Failed to suspend device, error -19
> 
> And that's a second error.  Then the system tries again:
> 
> [  722.831431] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 43 using xhci_hcd
> [  722.854608] usb 2-1: New USB device found, idVendor=0bda, idProduct=0487, bcdDevice= 1.47
> [  722.854611] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [  722.854614] usb 2-1: Product: Dell dock
> [  722.854616] usb 2-1: Manufacturer: Dell Inc.
> [  722.857034] hub 2-1:1.0: USB hub found
> [  722.857440] hub 2-1:1.0: 4 ports detected
> [  723.090106] hub 2-1:1.0: hub_ext_port_status failed (err = -71)
> [  723.090316] usb 2-1-port3: attempt power cycle
> [  723.471212] usb 2-1: USB disconnect, device number 43
> [  723.471274] usb 2-1: Failed to suspend device, error -19
> 
> And it fails in the same way.  A third try ends up succeeding:
> 
> [  723.743465] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 48 using xhci_hcd
> ...
> 
> I don't see why the device numbers jump around in such a crazy manner.  
> When device 38 disconnects, the next device is number 43.  And when 
> that fails, the next number is 48!  It looks like device numbers are 
> somehow being allocated for the four child devices even though the log 
> doesn't show them being detected.
> 
> Here's a portion of the usbmon trace that apparently corresponds
> somewhat to part of the log above, although it's hard to be certain.  
> In this trace, device 77 on bus 2 is 2-1.  Unlike the log above, it
> only has a child attached to port 3, not to port 4:
> 
> ffff9879b43ff840 291366827 S Ci:2:001:0 s a3 00 0000 0001 0004 4 <
> ffff9879b43ff840 291366846 C Ci:2:001:0 0 4 = 03020000
> ffff9879b2f54780 291469875 S Ci:2:077:0 s a3 00 0000 0001 0004 4 <
> ffff9879b2f54780 291470062 C Ci:2:077:0 0 4 = a0020000
> ffff9879b2f54780 291470069 S Ci:2:077:0 s a3 00 0000 0002 0004 4 <
> ffff9879b2f54780 291470208 C Ci:2:077:0 0 4 = a0020000
> ffff9879b2f54780 291470214 S Ci:2:077:0 s a3 00 0000 0003 0004 4 <
> ffff9879b2f54780 291470387 C Ci:2:077:0 0 4 = 03021000
> ffff9879b2f54780 291470392 S Co:2:077:0 s 23 01 0014 0003 0000 0
> ffff9879b2f54780 291470527 C Co:2:077:0 0 0
> ffff9879b2f54780 291470533 S Ci:2:077:0 s a3 00 0000 0004 0004 4 <
> ffff9879b2f54780 291470709 C Ci:2:077:0 0 4 = a0020000
> ffff9879b43ff480 291573804 S Ii:2:077:1 -115:128 2 <
> ffff9879b2f54780 291573839 S Ci:2:077:0 s a3 00 0000 0003 0004 4 <
> ffff9879b2f54780 291574068 C Ci:2:077:0 0 4 = 03020000
> ffff9879b2f54780 291574166 S Ci:2:077:0 s a3 00 0000 0003 0004 4 <
> ffff9879b2f54780 291574363 C Ci:2:077:0 0 4 = 03020000
> ffff9879b2f54780 291574383 S Co:2:077:0 s 23 03 0004 0003 0000 0
> ffff9879b2f54780 291574580 C Co:2:077:0 0 0
> ffff9879b2f54780 291593804 S Ci:2:077:0 s a3 00 0002 0003 0008 8 <
> ffff9879ae2969c0 291595804 C Ii:2:001:1 0:2048 1 = 02
> ffff9879ae2969c0 291595824 S Ii:2:001:1 -115:2048 4 <
> ffff9879ba3476c0 291595843 S Ci:2:001:0 s a3 00 0000 0001 0004 4 <
> ffff9879ba3476c0 291595857 C Ci:2:001:0 0 4 = c0024100
> 
> As far as I can tell, the device was working fine until at this point
> it disconnected itself.
> 
> ffff9879ba3476c0 291595863 S Co:2:001:0 s 23 01 0010 0001 0000 0
> ffff9879ba3476c0 291595867 C Co:2:001:0 0 0
> ffff9879ba3476c0 291595870 S Co:2:001:0 s 23 01 0019 0001 0000 0
> ffff9879ba3476c0 291595875 C Co:2:001:0 0 0
> ffff9879ba3476c0 291595879 S Co:2:001:0 s 23 03 001c 0001 0000 0
> ffff9879ba3476c0 291595884 C Co:2:001:0 0 0
> ffff9879b2f54780 291600949 C Ci:2:077:0 -71 0
> ffff9879b2f54780 291600997 S Co:2:077:0 s 23 03 0004 0003 0000 0
> ffff9879b2f54780 291601000 E Co:2:077:0 -19 0
> 
> Mathias, any suggestions on how Acelan can track down the reasons for 
> these problems?

There are three issues for this hub:
- It may not be in U3 after suspend
- It may not be in U0 after resume
- It disconnects when setting downstream ports to U0 while the hub itself is not in U0.

I'll send a patch series soon.

Kai-Heng

> 
> Alan Stern

Powered by blists - more mailing lists