linux-kernel - Re: [RFC PATCH] usb: host: xhci: Release spinlock before xhci

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <93a08563-a8f2-4004-bf91-884611b7cc7d@oss.qualcomm.com>
Date: Thu, 6 Nov 2025 15:27:33 +0530
From: Uttkarsh Aggarwal <uttkarsh.aggarwal@....qualcomm.com>
To: Mathias Nyman <mathias.nyman@...ux.intel.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
        wesley.cheng@....qualcomm.com
Subject: Re: [RFC PATCH] usb: host: xhci: Release spinlock before
 xhci_handshake in command ring abort



On 10/22/2025 6:19 PM, Mathias Nyman wrote:
> On 10/22/25 13:00, Uttkarsh Aggarwal wrote:
>> Currently xhci_handshake is a polling loop that waits for change of 
>> state.
>> If this loop is executed while holding a spinlock with IRQs disabled, it
>> can block interrupts for up to 5 seconds.
>>
>> To prevent prolonged IRQ disable durations that may lead to watchdog
>> timeouts, release the spinlock before invoking xhci_handshake() in
>> xhci_abort_cmd_ring().
>>
>> The spinlock is reacquired after the handshake to continue with 
>> controller
>> halt and recovery if needed.
> 
> Is this a real issue?
> 
> It should be extremely rare that commands need to be aborted, and even
> less likely that it takes five seconds to stop the command ring.
> 
> Can you take logs and traces of this (if it's reproducible).
> I suspect there is some other issue that needs to be fixed.
> 
> I agree that keeping the spin lock for up to five seconds isn't a good 
> idea, but
> just unlocking before the command ring has stopped opens up new race risks.
> 
> We need to ensure that queuing a new command mid ring abortion, before 
> ring stopped,
> doesn't cause new issues, or that handling command completion events, 
> including the
> "stop command ring" event, before polling for a stopped ring works fine.
> 

Hi Mathias,

Yes, actually when we do usb headset connected, no audio playing, resume 
the device and do some operations like volume +/-, there will be 
xhci_handle_command_timeout.

logs:

It’s wdog bark, we can see : xhci_handle_command_timeout kicked in since 
68346, but not end till 68356 dog bark.

[68346.109071][T25562] [2025:09:04 23:19:02](2)[25562:kworker/2:3]lwz 
debug: xhci_handle_command_timeout start
[68346.109081][T25562] [2025:09:04 23:19:02](2)[25562:kworker/2:3]lwz 
debug: xhci_handle_command_timeout xhci_abort_cmd_ring start
[68346.109083][T25562] [2025:09:04 23:19:02](2)[25562:kworker/2:3]lwz 
debug: xhci_abort_cmd_ring start
[68346.109084][T25562] [2025:09:04 23:19:02](2)[25562:kworker/2:3]lwz 
debug: xhci_abort_cmd_ring xhci_trb_virt_to_dma
[68346.109087][T25562] [2025:09:04 23:19:02](2)[25562:kworker/2:3]lwz 
debug: xhci_abort_cmd_ring xhci_handshake_check_state start
….
[68356.215787][    C1] [2025:09:04 23:19:10](1)msm_watchdog 
17600000.qcom,wdt: Causing a QCOM Apps Watchdog bite!
[68356.215790][    C1] [2025:09:04 23:19:10](1)[RB/E]rb_sreason_str_set: 
sreason_str set wdog_bite

Call Trace:
xhci_configure_endpoint hang in below call stack.

Stack:
__switch_to+0x1a8
schedule+0xacc
schedule_timeout+0x4c
wait_for_common+0xc0
wait_for_completion+0x18
xhci_configure_endpoint+0xe4
xhci_check_bandwidth+0x1a8
usb_hcd_alloc_bandwidth+0x37c
usb_disable_device_endpoints+0xb0
usb_disable_device+0x30
usb_disconnect+0xd0
hub_event+0xc4c
process_scheduled_works+0x208
worker_thread+0x1c0
kthread+0xfc
ret_from_fork+0x10

Finally crash on dog bite.

sdei_wdg_bite_cb[qcom_sdei]+0x110
sdei_event_handler+0x3c
do_sdei_event+0x88
__sdei_handler+0x48
__sdei_asm_handler+0x110
__const_udelay+0x148
xhci_handshake+0x58
xhci_handle_command_timeout+0x248
process_scheduled_works+0x208
worker_thread+0x1c0
kthread+0xfc
ret_from_fork+0x10

Based on the logs and crash stacks I have collected so far, including 
the watchdog bark and the prolonged execution of 
xhci_handle_command_timeout,
it seems likely that there's an underlying issue causing the ring to 
hang during xhci_configure_endpoint.

Thanks,
Uttkarsh,