lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241118222355.482cf783@foxbook>
Date: Mon, 18 Nov 2024 22:23:55 +0100
From: MichaƂ Pecio <michal.pecio@...il.com>
To: linuxusb.ml@...dtek.de
Cc: linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org,
 stern@...land.harvard.edu
Subject: Re: Highly critical bug in XHCI Controller

> In my experience with USB anything that is a 'temporary' failure can
> be considered as 'permanent' failure and I've really seen a lot over
> the last 1 1/2 decades.
> However issues are mostly related to immature controllers / missing
> quirks for some controllers.
> Our devices in the field since 2008 usually pump around 100-300mbit
> through the USB 2 link,
> streaming  devices which usually run for a long period of time (up to
> months / years).
> 'retrying' something on a link where something has gone wrong for sure
> never worked properly for me, it would have continued with another
> followup issue at some point.

You may have simply seen hardware going dead or buggy drivers failing
to recover from recoverable errors.

Random bit errors really happen and (excepting isochronous endpoints)
can be recovered from. But if you get -EPROTO on a bulk endpoint, for
example, it means the endpoint halted and should be reset. Few Linux
drivers seem to bother with such things.

I even think xHCI's handling of halted endpoints and usb_clear_halt()
is broken, but it looks like fixing it would break all the buggy class
drivers on the other hand, which are currently "sort of functional".

> Anyway can you give a particular example where this 'retrying'
> mechanism and reloading the endpoint size solves or solved a problem?

It seems to happen when you insert the plug slowly or at an angle, and
contact is briefly lost while the device is being initialized.

The first three lines below come from hub_port_init(), which looks like
it is being called by hub_port_connect() in a loop.

[81169.840924] usb 5-1: new full-speed USB device number 61 using ohci-pci
[81170.387927] usb 5-1: device not accepting address 61, error -62
[81170.742931] usb 5-1: new full-speed USB device number 62 using ohci-pci
[81170.901914] usb 5-1: New USB device found, idVendor=067b, idProduct=2303, bcdDevice= 3.00
[81170.901919] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[81170.901921] usb 5-1: Product: USB-Serial Controller
[81170.901922] usb 5-1: Manufacturer: Prolific Technology Inc.

Another example which could trigger retries is a device which includes
a permanent "presence detect" resistor (such as PL2303, coincidentally)
but takes a long time to initialize itself and start responding.

Regards,
Michal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ