linux-kernel - Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <68228cc9-5d1a-6872-0956-a7006fe3b943@oracle.com>
Date:   Tue, 1 Sep 2020 16:54:48 -0600
From:   Khalid Aziz <khalid.aziz@...cle.com>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     gregkh@...uxfoundation.org, erkka.talvitie@...cit.fi,
        linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low
 speed devices

On 9/1/20 1:51 PM, Alan Stern wrote:
> On Tue, Sep 01, 2020 at 11:00:16AM -0600, Khalid Aziz wrote:
>> On 9/1/20 10:36 AM, Alan Stern wrote:
>>> On Tue, Sep 01, 2020 at 09:15:46AM -0700, Khalid Aziz wrote:
>>>> On 8/31/20 8:31 PM, Alan Stern wrote:
>>>>> Can you collect a usbmon trace showing an example of this problem?
>>>>>
>>>>
>>>> I have attached usbmon traces for when USB hub with keyboards and mouse
>>>> is plugged into USB 2.0 port and when it is plugged into the NEC USB 3.0
>>>> port.
>>>
>>> The usbmon traces show lots of errors, but no Clear-TT events.  The 
>>> large number of errors suggests that you've got a hardware problem; 
>>> either a bad hub or bad USB connections.
>>
>> That is what I thought initially which is why I got additional hubs and
>> a USB 2.0 PCI card to test. I am seeing errors across 3 USB controllers,
>> 4 USB hubs and 4 slow/full speed devices. All of the hubs and slow/full
>> devices work with zero errors on my laptop. My keyboard/mouse devices
>> and 2 of my USB hubs predate motherboard update and they all worked
>> flawlessly before the motherboard upgrade. Some combinations of these
>> also works with no errors on my desktop with new motherboard that I had
>> listed in my original email:
> 
> It's a very puzzling situation.
> 
> One thing which probably would work well, surprisingly, would be to buy 
> an old USB-1.1 hub and plug it into the PCI card.  That combination is 
> likely to be similar to what you see when plugging the devices directly 
> into the PCI card.  It might even work okay with the USB-3 controllers.
> 
>> 2. USB 2.0 controller - WORKS
>> 5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS
>>
>> I am not seeing a common failure here that would point to any specific
>> hardware being bad. Besides, that one code change (which I still can't
>> say is the right code change) in ehci-q.c makes USB 2.0 controller work
>> reliably with all my devices.
> 
> The USB and EHCI designs are flawed in that under the circumstances 
> you're seeing, they don't have any way to tell the difference between a 
> STALL and a host timing error.  The current code treats these situations 
> as timing/transmission errors (resulting in device resets); your change 
> causes them to be treated as STALLs.  However, there are known, common 
> situations in which those same symptoms really are caused by 
> transmission errors, so we don't want to start treating them as STALLs.
> 
> Besides, I suspect that your code change does _not_ make the USB-2 
> controller work reliably with your devices.  You should collect a usbmon 
> trace under those conditions; I predict it will be full of STALLs.  And 
> furthermore, I believe these STALLs will not show up in a usbmon trace 
> made with the devices plugged directly into the PCI card.  If I'm right 
> about these things, the errors are still present even with your patch; 
> all it does is hide them.
> 
> Short of a USB bus analyzer, however, there's no way to tell what's 
> really going on.

I have managed to find a hardware combination that seems to work, so for
now at least my machine is usable. I will figure out how to interpret
usbmon output and run more experiments. There seems to be a real problem
in the driver somewhere and should be solved.

Thanks,
Khalid