lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd277ca2-6225-43f6-b833-fe41c2d7f686@linux.intel.com>
Date: Mon, 7 Apr 2025 10:15:29 +0300
From: Mathias Nyman <mathias.nyman@...ux.intel.com>
To: Alan Stern <stern@...land.harvard.edu>,
 Michał Pecio <michal.pecio@...il.com>
Cc: Paul Menzel <pmenzel@...gen.mpg.de>,
 Mathias Nyman <mathias.nyman@...el.com>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-usb@...r.kernel.org,
 LKML <linux-kernel@...r.kernel.org>
Subject: Re: xhci: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep
 state.

On 6.4.2025 5.40, Alan Stern wrote:
> On Sun, Apr 06, 2025 at 12:23:11AM +0200, Michał Pecio wrote:
>> Looks like some URB stalled and usb_storage reset the device without
>> usb_clear_halt(). Then the core didn't usb_hcd_reset_endpoint() either.
>> And apparently EP_STALLED is still set in xhci_hcd after all that time.
>>
>> Then usb_storage submits one URB which never executes because the EP
>> is in Running-Idle state and the doorbell is inhibited by EP_STALLED.
>> 30s later it times out, unlinks the URB and resets again. Set TR Deq
>> fails because the endpoint is Running.
> 
>> Not sure if it's a USB core bug or something that xHCI should take
>> care of on its own. For now, reverting those two "stall" patches ought
>> to clean up the noise.
> 
> The core believes that resetting a device should erase the endpoint
> information in the HCD.  There is a callback in hub_port_reset() to that
> effect:
> 
> 		if (hcd->driver->reset_device)
> 			hcd->driver->reset_device(hcd, udev);
> 
> So after this the EP should not be in the Running-Idle state; in fact it
> should not exist at all (unless it is ep0, but in this case I think it
> isn't).
> 

> Is the implementation of the reset_device callback in xhci-hcd missing
> something?
> 
> Alan Stern

Thanks, I believe this is at least part of the issue here, thanks for the tip.

We don't clear the virt_dev->eps[ep_index].ep_state flags after device reset.

And the two new patches Michal pointed out rely even more of ep_state flags than
before, causing a regression.

0c74d232578b xhci: Avoid queuing redundant Stop Endpoint command for stalled endpoint
860f5d0d3594 xhci: Prevent early endpoint restart when handling STALL errors.

Does this oneliner help?

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 0452b8d65832..044c70c17746 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3930,6 +3930,7 @@ static int xhci_discover_or_reset_device(struct usb_hcd *hcd,
					&virt_dev->eps[i],
					virt_dev->tt_info);
		xhci_clear_endpoint_bw_info(&virt_dev->eps[i].bw_info);
+		ep->ep_state = 0;
	}
	/* If necessary, update the number of active TTs on this root port */
	xhci_update_tt_active_eps(xhci, virt_dev, old_active_eps);

Thanks
Mathias


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ