[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a4199978d55410911a2f51fb8d63bbeb072c227e.camel@sundtek.de>
Date: Sun, 17 Nov 2024 23:03:59 +0800
From: Markus Rechberger <linuxusb.ml@...dtek.de>
To: Michał Pecio <michal.pecio@...il.com>
Cc: linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org
Subject: Re: Highly critical bug in XHCI Controller
On Sun, 2024-11-17 at 15:35 +0100, Michał Pecio wrote:
> Hi,
>
> > Basically the issue comes from hub_port_connect.
> >
> > drivers/usb/core/hub.c
> >
> > hub_port_init returns -71 -EPROTO and jumps to loop
> > https://github.com/torvalds/linux/blob/master/drivers/usb/core/hub.c#L5450
> >
> > I'd question if usb_ep0_reinit is really required in loop which is
> > running following functions:
> > usb_disable_endpoint(udev, 0 + USB_DIR_IN, true);
> > usb_disable_endpoint(udev, 0 + USB_DIR_OUT, true);
> > usb_enable_endpoint(udev, &udev->ep0, true);
> >
> > this is something only experience over the past decades can tell?
> >
> > usb_enable_endpoint will trigger xhci_endpoint_reset which doesn't
> > do
> > much, but crashes the entire system with the upstream kernel when
> > it
> > triggers xhci_check_bw_table).
> >
> > I removed usb_ep0_reinit here and devices are still workable under
> > various conditions (again I shorted and pulled D+/D- to ground for
> > testing).
>
> xHCI isn't the only host controller supported by Linux, and
> usb_ep0_reinit() predates it. Maybe it's pointless today, maybe
> it isn't, but it's not the root cause of your problem anyway.
>
exactly, but it shouldn't go there anyway. This section of the code is
only for 'in case an error already happened'.
That's why I'm asking if anyone knows a practical situation where this
is really needed - and did it ever help?
I'm working with USB across many systems for nearly 2 decades and I
never saw any of those fallbacks succeeding. Usually the way to recover
a device which had connection issues was to reconnect the device
completely.
> > The NULL PTR check in xhci_check_bw_table would be a second line
> > of defense but as indicated in the first mail it shouldn't even get
> > there.
>
> It's an xHCI bug that BW calculation is attempted on an uninitialized
> device and crashes. Looks like a NULL check somewhere is exactly what
> is needed, or maybe avoid it completely on EP0 (it's probably no-op).
>
Yes this would be the second line of defense as indicated in my email
before.
I'm preparing 2 small patches with comments in the code.
One commenting out usb_ep0_reinit and the other one a simple NULL PTR
check - but
again the code shouldn't be executed at all when bw_table == NULL
something already
went wrong, the issue should be handled earlier in the code already.
bw_table is not setting itself to NULL or not at all.
Since it's infrastructure code it's a sensitive part.
This issue fully crashed my Asus notebook at least 3 times when working
with a USB ethernet adapter and a USB harddisk. Not only faulty
hardware is causing that problem also simple hotplug connection
problems.
> Other similar bug recently:
> https://lore.kernel.org/linux-usb/D3CKQQAETH47.1MUO22RTCH2O3@matfyz.cz/T/#u
>
> Yours too should be unique to those Intel Panther Point chipsets.
This is exactly the same problem yes.
The problem will happen with all systems which use the xhci driver.
Best Regards,
Markus
>
> > As a second issue I found in usb_reset_and_verify device
> > https://github.com/torvalds/linux/blob/master/drivers/usb/core/hub.c#L6131
> >
> > ret = hub_port_init(parent_hub, udev, port1, i,
> > &descriptor);
> > if (ret >= 0 || ret == -ENOTCONN || ret == -ENODEV) {
> > break;
> > }
> >
> > hub_port_init can also return -71 / -EPROTO, the cases should be
> > very
> > rare when usb_reset_and_verify_device is triggered and that
> > happens.
>
> Right, and the intent seems to be to simply retry in this case.
>
> Regards,
> Michal
Powered by blists - more mailing lists