linux-kernel - Re: xhci: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250406095008.0dbfd586@foxbook>
Date: Sun, 6 Apr 2025 09:50:08 +0200
From: Michał Pecio <michal.pecio@...il.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: Paul Menzel <pmenzel@...gen.mpg.de>, Mathias Nyman
 <mathias.nyman@...ux.intel.com>, Mathias Nyman <mathias.nyman@...el.com>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-usb@...r.kernel.org,
 LKML <linux-kernel@...r.kernel.org>
Subject: Re: xhci: WARN Set TR Deq Ptr cmd failed due to incorrect slot or
 ep state.

Thanks, Alan.

On Sat, 5 Apr 2025 22:40:51 -0400, Alan Stern wrote:
> The core believes that resetting a device should erase the endpoint 
> information in the HCD.  There is a callback in hub_port_reset() to
> that effect:
> 
> 		if (hcd->driver->reset_device)
> 			hcd->driver->reset_device(hcd, udev);

Makes sense, reset_device() does indeed nuke everything besides EP0.
When the endpoints are re-added later, their will get blank state.

> So after this the EP should not be in the Running-Idle state; in fact
> it should not exist at all (unless it is ep0, but in this case I
> think it isn't).

The log shows that reset_device() is followed by a few drop_endpoint()
and add_endpoint(), so the endpoint gets enabled again before new URBs
are submitted, no problem here.

What I found suspicious is that there is also endpoint_disable() and
I'm not sure where it comes from. Looking at core code, it seems to
often be followed by endpoint_reset(), but the log doesn't show that.

> Is the implementation of the reset_device callback in xhci-hcd
> missing something?

I guess we could clear EP_STALLED when an endpoint is being re-enabled.
This would cover reset_device() and also drop_endpoint().


Paul, could you apply this patch and report back if it fixes the
warnings and reset loops and which XXX messages are printed?
BTW, if those "is now EP_STALLED" and "no longer EP_STALLED" cause
too much noise, you can remove them, they are not very important.

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 3f918fea7aea..0ebf7980780f 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1463,6 +1463,10 @@ int xhci_endpoint_init(struct xhci_hcd *xhci,
 	if (!virt_dev->eps[ep_index].new_ring)
 		return -ENOMEM;
 
+	if (virt_dev->eps[ep_index].ep_state & EP_STALLED)
+		dev_err(&udev->dev, "XXX ep %d still EP_STALLED on init, clearing\n", ep_index);
+	virt_dev->eps[ep_index].ep_state &= ~EP_STALLED;
+
 	virt_dev->eps[ep_index].skip = false;
 	ep_ring = virt_dev->eps[ep_index].new_ring;
 
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e0816d2f6b75..b674f6b4be3c 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2541,6 +2541,7 @@ static void process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	struct xhci_slot_ctx *slot_ctx;
 	u32 trb_comp_code;
 	u32 remaining, requested, ep_trb_len;
+	u32 ep_index = TRB_TO_EP_ID(le32_to_cpu(event->flags)) - 1;
 
 	slot_ctx = xhci_get_slot_ctx(xhci, ep->vdev->out_ctx);
 	trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->transfer_len));
@@ -2581,6 +2582,7 @@ static void process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		xhci_handle_halted_endpoint(xhci, ep, td, EP_SOFT_RESET);
 		return;
 	case COMP_STALL_ERROR:
+		dev_err(&td->urb->dev->dev, "XXX ep %d is now EP_STALLED\n", ep_index);
 		ep->ep_state |= EP_STALLED;
 		break;
 	default:
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 83a4adf57bae..9066e1b26ceb 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1607,7 +1607,7 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
 
 	/* Class driver might not be aware ep halted due to async URB giveback */
 	if (*ep_state & EP_STALLED)
-		dev_dbg(&urb->dev->dev, "URB %p queued before clearing halt\n",
+		dev_err(&urb->dev->dev, "XXX URB %p queued before clearing halt\n",
 			urb);
 
 	switch (usb_endpoint_type(&urb->ep->desc)) {
@@ -1772,6 +1772,11 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 
 	/* In these cases no commands are pending but the endpoint is stopped */
 	if (ep->ep_state & (EP_CLEARING_TT | EP_STALLED)) {
+		struct xhci_ep_ctx *ep_ctx = xhci_get_ep_ctx(xhci, vdev->out_ctx, ep_index);
+		int ctx_state = GET_EP_CTX_STATE(ep_ctx);
+		if (ctx_state != EP_STATE_STOPPED)
+			dev_err(&urb->dev->dev, "XXX WTF ep_state %d ctx_state %d\n", ep->ep_state, ctx_state);
+
 		/* and cancelled TDs can be given back right away */
 		xhci_dbg(xhci, "Invalidating TDs instantly on slot %d ep %d in state 0x%x\n",
 				urb->dev->slot_id, ep_index, ep->ep_state);
@@ -3211,6 +3216,8 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
 
 	spin_lock_irqsave(&xhci->lock, flags);
 
+	if (ep->ep_state & EP_STALLED)
+		dev_err(&udev->dev, "XXX ep %d no longer EP_STALLED\n", ep_index);
 	ep->ep_state &= ~EP_STALLED;
 
 	/* Bail out if toggle is already being cleared by a endpoint reset */