linux-kernel - Re: [RFC PATCH] usb: xhci: Skip configure EP for disabled slots during teardown

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7631bc7d-e3b2-45b2-9b85-f03ed1d6b3cd@linux.intel.com>
Date: Wed, 7 Jan 2026 18:50:40 +0200
From: Mathias Nyman <mathias.nyman@...ux.intel.com>
To: Udipto Goswami <udipto.goswami@....qualcomm.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Mathias Nyman <mathias.nyman@...el.com>, linux-usb@...r.kernel.org,
 linux-kernel@...r.kernel.org, Alan Stern <stern@...land.harvard.edu>
Subject: Re: [RFC PATCH] usb: xhci: Skip configure EP for disabled slots
 during teardown

On 1/6/26 12:22, Udipto Goswami wrote:
> On Mon, Jan 5, 2026 at 4:32 PM Mathias Nyman
> <mathias.nyman@...ux.intel.com> wrote:
>>
>> Hi
>>
>> On 1/5/26 10:48, Udipto Goswami wrote:
>>> Consider a scenario when a HS headset fails resume and the hub performs
>>> a logical disconnect, the USB core tears down endpoints and calls
>>> hcd->check_bandwidth() on the way out, which with xHCI translates to a
>>> drop-only Configure Endpoint command (add_flags == SLOT_FLAG, drop_flags
>>> != 0). If the slot is already disabled (slot_id == 0) or the virtual
>>> device has been freed, issuing this Configure Endpoint command is
>>> pointless and may appear stuck until event handling catches up,
>>> causing unnecessary delays during disconnect teardown.
>>>
>>> Fix this by adding a check in xhci_check_bandwidth(), return success
>>> immediately if slot_id == 0 or vdev is missing, preventing the
>>> Configure Endpoint command from being queued at all. Additionally,
>>> in xhci_configure_endpoint() for drop-only Configure Endpoint operations,
>>> return success early if slot_id == 0 or vdev is already freed,
>>> avoiding spurious command waits.
>>>
>>> Signed-off-by: Udipto Goswami <udipto.goswami@....qualcomm.com>
>>
>> Makes sense to prevent unnecessary 'configure endpoint' commands
>>
>> Could you share more details how we end up tearing down endpoints and
>> calling xhci_check_bandwidth() after vdev is freed and slot_id set to zero?
>>
>> Did the whole xHC controller fail to resume and was reinitialized in
>> xhci_resume() power_lost path?
>>
>> Or is this related to audio offload and xhci sideband usage?
>>
>> If we end up in this situation in normal headset resume failure then there
>> might be something else wrong.
>>
> 
> Apologies! My mailbox was configured with HTML.
> Re-sending in plain text.
> 
> Hi Mathias,
> 
> Yes, we are using offloaded audio in this case and xhci-sideband is involved.
> 
> Scenario:
> The headset is connected to the platform with no active playback, so
> it suspends. No physical disconnect occurs.
> 
> 1. Audio DSP sends a playback request while the USB headset (device
> 1-1) is suspended
> 2. Resume chain is triggered:
>     handle_uaudio_stream_req
>     → enable_audio_stream
>     → snd_usb_autoresume
>     → dwc3-parent_wrapper (Qualcomm) → xhci → roothub → USB headset (1-1)
> 3. Resume fails at device 1-1:The headset fails to resume from
> suspend. Note that the xHCI controller itself resumes
> successfully—only the headset device fails.
> 4. Hub performs logical disconnect as a recovery mechanism
> 5. Race condition occurs: The USB core begins to teardown (calling
> 'check_bandwidth()'), but the xHCI driver may have already started
> freeing the slot due to the failed resume.
> 
> Two parallel paths:
> PATH1: (slower usb core teardown)
> 
> hub_port_connect_change()
> └─ Device resume fails
>     └─ hub_port_logical_disconnect()
>        └─ usb_disconnect()
>           └─ usb_disable_device()
>              ├─ usb_disable_endpoint() [for each endpoint]
>              │  └─ usb_hcd_disable_endpoint()
>              └─ usb_hcd_alloc_bandwidth()
>                 └─ usb_hcd_check_bandwidth()
>                    └─ xhci_check_bandwidth() ← POINT OF FAILURE
>                       └─ Tries to issue Configure Endpoint
>                          └─ But slot_id == 0 or virt_dev == NULL!
> 
> PATH2: (faster - xhci slot cleanup)
> hub_port_logical_disconnect()
> └─ usb_disconnect()
>     └─ usb_release_dev()
>        └─ usb_hcd_free_dev()
>           └─ xhci_free_dev()
>              └─ xhci_disable_slot()
>                 ├─ Issues TRB_DISABLE_SLOT command
>                 ├─ Waits for completion
>                 └─ xhci_free_virt_device()
>                    ├─ Sets udev->slot_id = 0
>                    ├─ Frees virt_dev
>                    └─ Sets xhci->devs[slot_id] = NULL
> 
> RACE TIMELINE:
> 
> Path 2 (fast)
>        Path 1 (slow)
> ─────────────────────────────────────────────────
> T1: xhci_free_dev() starts
> T2: xhci_disable_slot() issued
> T3: slot_id = 0
> T4: virt_dev freed
> usb_disable_endpoint()
> T5: xhci->devs[slot_id] = NULL                             (still processing...)
> T6:
>       xhci_check_bandwidth() ← RACE!
> T7:
>       Tries Configure Endpoint
> T8:
>       But slot is already freed!
> 
> Path 1 is slower because it must iterate through all endpoints,
> calling usb_disable_endpoint() for each one before reaching
> check_bandwidth().
> Path 2 completes faster with a single disable slot command. So if
> T3-T5 has already executed, meaning tthe slot has already freed then
> configure endpoint commands can be skipped i.e T6-T8.
> Please let me know if this makes sense ?

Thanks, well explained and nicely laid out.

There is something still odd in this scenario.

There shouldn't be two racing paths as both cases should be handled by
the hub work 'thread' that only has one active work item.

If resume fails then hub_port_logical_disconnect() is called and marks the device
as "USB_STATE_NOTATTACHED", and adds a change_bit for the port.
hub work should take over from there.

hub work should then do:
hub_event()
   port_event(hub, i);    // because hub->change_bit is set for this port
     hub_port_connect_change()
       hub_port_connect()
         if (udev)
           usb_disconnect()
             usb_disable_device()  //children first
               usb_disable_device_endpoints()  // for each endpoint
                 usb_hcd_alloc_bandwidth(dev, NULL, NULL, NULL);
                   hcd->driver->check_bandwidth()  // does all the configure endpoint commands
             device_del(&udev->dev);
             hub_free_dev(udev)
               hcd->driver->free_dev(hcd, udev);  // clears virt_dev and slot_id here
             put_device(&udev->dev);

To me this looks like driver->check_bandwitdth() is called before driver->free_dev().
  
Thanks
Mathias