linux-kernel - Re: Highly critical bug in XHCI Controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dff9f24a-4aa2-45aa-920f-20876cf4ccbf@rowland.harvard.edu>
Date: Sun, 17 Nov 2024 16:02:49 -0500
From: Alan Stern <stern@...land.harvard.edu>
To: Markus Rechberger <linuxusb.ml@...dtek.de>
Cc: linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Highly critical bug in XHCI Controller

On Sun, Nov 17, 2024 at 11:47:52PM +0800, Markus Rechberger wrote:
> On Sun, 2024-11-17 at 10:18 -0500, Alan Stern wrote:
> > On Sun, Nov 17, 2024 at 08:44:16PM +0800, Markus Rechberger wrote:
> > > Basically the issue comes from hub_port_connect.
> > > 
> > > drivers/usb/core/hub.c
> > > 
> > > hub_port_init returns -71 -EPROTO and jumps to loop
> > > https://github.com/torvalds/linux/blob/master/drivers/usb/core/hub.c#L5450
> > > 
> > > I'd question if usb_ep0_reinit is really required in loop which is
> > > running following functions:

> This should only go down there in case an error happened earlier no?

Of course, since -EPROTO indicates there was an error.

> In case the hardware signed up correctly it should not even enter that
> code.
> 
> My experience is just - reconnect the device in case an error happened

You can't reconnect some devices; they are permanently attached.  There 
are other reasons why reconnecting might not be practical.

> those
> workarounds did not work properly for the device I deal with (but yes
> that's
> why I'm asking - maybe someone else has different hardware with
> different
> experience).

I doubt we will hear from these people, because they will not realize 
that anything was wrong.

In any case, it is generally recognized that the type of errors leading 
to -EPROTO (bad CRC or no response, for instance) can be temporary or 
intermittent.  Retrying is an appropriate strategy.

> > > As a second issue I found in usb_reset_and_verify device 
> > > https://github.com/torvalds/linux/blob/master/drivers/usb/core/hub.c#L6131
> > > 
> > >         ret = hub_port_init(parent_hub, udev, port1, i,
> > > &descriptor);
> > >         if (ret >= 0 || ret == -ENOTCONN || ret == -ENODEV) {
> > >             break;
> > >         }
> > > 
> > > hub_port_init can also return -71 / -EPROTO, the cases should be
> > > very
> > > rare when usb_reset_and_verify_device is triggered and that
> > > happens.
> > 
> > If that happens, the loop which this code sits inside will simply 
> > perform another iteration.  That's what  it's supposed to do, not an 
> > issue at all.
> > 
> 
> It doesn't cause any issue yes but it's not correct either.

Why not?  What's wrong with it?

> -EPROTO will be returned if I disconnect or short the device here. So
> initially someone
> thought he should check for -ENODEV/-ENOTCONN (which should also
> indicate that the device is gone), so -EPROTO should also be checked in
> that case.
> Otherwise just remove all those error checks.

-EPROTO does not necessarily indicate the device is gone; it indicates 
there was a problem communicating with the device.  The problem might be 
caused by a disconnection, or it might be caused by temporary 
electromagnetic interference (for example), or by something else.

Alan Stern