lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250406002311.2a76fc64@foxbook>
Date: Sun, 6 Apr 2025 00:23:11 +0200
From: MichaƂ Pecio <michal.pecio@...il.com>
To: Paul Menzel <pmenzel@...gen.mpg.de>
Cc: Mathias Nyman <mathias.nyman@...ux.intel.com>, Mathias Nyman
 <mathias.nyman@...el.com>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 linux-usb@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: xhci: WARN Set TR Deq Ptr cmd failed due to incorrect slot or
 ep state.

OK, I think I see it.

On Sat, 05 Apr 2025 05:23:12 +0000, Paul Menzel wrote:
> [  326.543262] xhci_hcd 0000:39:00.0: Resetting device with slot ID 5
> [  326.543294] xhci_hcd 0000:39:00.0: // Ding dong!
> [  326.543549] xhci_hcd 0000:39:00.0: Completed reset device command.
> [  326.543588] xhci_hcd 0000:39:00.0: Successful reset device command.
> [  326.543730] xhci_hcd 0000:39:00.0: // Ding dong!
> [  326.543838] xhci_hcd 0000:39:00.0: Successful setup address command
> [  326.543858] xhci_hcd 0000:39:00.0: Op regs DCBAA ptr = 0x00000133845000
> [  326.543876] xhci_hcd 0000:39:00.0: Slot ID 5 dcbaa entry @00000000ced6807f = 0x000001339f4000
> [  326.543897] xhci_hcd 0000:39:00.0: Output Context DMA address = 0x1339f4000
> [  326.543904] xhci_hcd 0000:39:00.0: Internal device address = 5
> [  326.543935] usb 4-1.4: reset SuperSpeed USB device number 4 using xhci_hcd
> [  326.560391] xhci_hcd 0000:39:00.0: Waiting for status stage event
> [  326.560446] xhci_hcd 0000:39:00.0: xhci_drop_endpoint called for udev 000000008c832e88
> [  326.560465] xhci_hcd 0000:39:00.0: xhci_drop_endpoint called for udev 000000008c832e88
> [  326.560483] xhci_hcd 0000:39:00.0: add ep 0x1, slot id 5, new drop flags = 0x0, new add flags = 0x4
> [  326.560499] xhci_hcd 0000:39:00.0: add ep 0x82, slot id 5, new drop flags = 0x0, new add flags = 0x24
> [  326.560508] xhci_hcd 0000:39:00.0: xhci_check_bandwidth called for udev 000000008c832e88
> [  326.560520] xhci_hcd 0000:39:00.0: // Ding dong!
> [  326.561031] xhci_hcd 0000:39:00.0: Successful Endpoint Configure command
> [  326.561209] xhci_hcd 0000:39:00.0: endpoint disable with ep_state 0x40
> [  326.561217] xhci_hcd 0000:39:00.0: endpoint disable with ep_state 0x240

Looks like some URB stalled and usb_storage reset the device without
usb_clear_halt(). Then the core didn't usb_hcd_reset_endpoint() either.
And apparently EP_STALLED is still set in xhci_hcd after all that time.

Then usb_storage submits one URB which never executes because the EP
is in Running-Idle state and the doorbell is inhibited by EP_STALLED.
30s later it times out, unlinks the URB and resets again. Set TR Deq
fails because the endpoint is Running.

> [  326.562226] usb 4-1.4: URB 00000000a9556a5f queued before clearing halt
> [  357.198396] xhci_hcd 0000:39:00.0: Invalidating TDs instantly on slot 5 ep 4 in state 0x240
> [  357.198405] xhci_hcd 0000:39:00.0: Removing canceled TD starting at 0x1645d5000 (dma) in stream 0 URB 00000000a9556a5f
> [  357.198422] xhci_hcd 0000:39:00.0: Set TR Deq ptr 0x1645d5010, cycle 1
> [  357.198429] xhci_hcd 0000:39:00.0: // Ding dong!
> [  357.198435] xhci_hcd 0000:39:00.0: xhci_giveback_invalidated_tds: Keep cancelled URB 00000000a9556a5f TD as cancel_status is 2
> [  357.198505] xhci_hcd 0000:39:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> [  357.198516] xhci_hcd 0000:39:00.0: Slot state = 3, EP state = 1
> [  357.198525] xhci_hcd 0000:39:00.0: xhci_handle_cmd_set_deq: Giveback cancelled URB 00000000a9556a5f TD
> [  357.198539] xhci_hcd 0000:39:00.0: xhci_handle_cmd_set_deq: All TDs cleared, ring doorbell

Not sure if it's a USB core bug or something that xHCI should take
care of on its own. For now, reverting those two "stall" patches ought
to clean up the noise.

Not 100% sure if this caused the stuck task issue, but 6.15 has this
CONFIG_DETECT_HUNG_TASK_BLOCKER which might be helpful in such cases.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ