linux-kernel - Re: [RFC PATCH] usb: xhci: Skip configure EP for disabled slots during teardown

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMTwNXDFM=csMEJ1ZhiTOeQ-dDH4eu4ze9XRFbSj0d-4Fxsp=g@mail.gmail.com>
Date: Tue, 6 Jan 2026 15:52:03 +0530
From: Udipto Goswami <udipto.goswami@....qualcomm.com>
To: Mathias Nyman <mathias.nyman@...ux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Mathias Nyman <mathias.nyman@...el.com>, linux-usb@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] usb: xhci: Skip configure EP for disabled slots
 during teardown

On Mon, Jan 5, 2026 at 4:32 PM Mathias Nyman
<mathias.nyman@...ux.intel.com> wrote:
>
> Hi
>
> On 1/5/26 10:48, Udipto Goswami wrote:
> > Consider a scenario when a HS headset fails resume and the hub performs
> > a logical disconnect, the USB core tears down endpoints and calls
> > hcd->check_bandwidth() on the way out, which with xHCI translates to a
> > drop-only Configure Endpoint command (add_flags == SLOT_FLAG, drop_flags
> > != 0). If the slot is already disabled (slot_id == 0) or the virtual
> > device has been freed, issuing this Configure Endpoint command is
> > pointless and may appear stuck until event handling catches up,
> > causing unnecessary delays during disconnect teardown.
> >
> > Fix this by adding a check in xhci_check_bandwidth(), return success
> > immediately if slot_id == 0 or vdev is missing, preventing the
> > Configure Endpoint command from being queued at all. Additionally,
> > in xhci_configure_endpoint() for drop-only Configure Endpoint operations,
> > return success early if slot_id == 0 or vdev is already freed,
> > avoiding spurious command waits.
> >
> > Signed-off-by: Udipto Goswami <udipto.goswami@....qualcomm.com>
>
> Makes sense to prevent unnecessary 'configure endpoint' commands
>
> Could you share more details how we end up tearing down endpoints and
> calling xhci_check_bandwidth() after vdev is freed and slot_id set to zero?
>
> Did the whole xHC controller fail to resume and was reinitialized in
> xhci_resume() power_lost path?
>
> Or is this related to audio offload and xhci sideband usage?
>
> If we end up in this situation in normal headset resume failure then there
> might be something else wrong.
>

Apologies! My mailbox was configured with HTML.
Re-sending in plain text.

Hi Mathias,

Yes, we are using offloaded audio in this case and xhci-sideband is involved.

Scenario:
The headset is connected to the platform with no active playback, so
it suspends. No physical disconnect occurs.

1. Audio DSP sends a playback request while the USB headset (device
1-1) is suspended
2. Resume chain is triggered:
   handle_uaudio_stream_req
   → enable_audio_stream
   → snd_usb_autoresume
   → dwc3-parent_wrapper (Qualcomm) → xhci → roothub → USB headset (1-1)
3. Resume fails at device 1-1:The headset fails to resume from
suspend. Note that the xHCI controller itself resumes
successfully—only the headset device fails.
4. Hub performs logical disconnect as a recovery mechanism
5. Race condition occurs: The USB core begins to teardown (calling
'check_bandwidth()'), but the xHCI driver may have already started
freeing the slot due to the failed resume.

Two parallel paths:
PATH1: (slower usb core teardown)

hub_port_connect_change()
└─ Device resume fails
   └─ hub_port_logical_disconnect()
      └─ usb_disconnect()
         └─ usb_disable_device()
            ├─ usb_disable_endpoint() [for each endpoint]
            │  └─ usb_hcd_disable_endpoint()
            └─ usb_hcd_alloc_bandwidth()
               └─ usb_hcd_check_bandwidth()
                  └─ xhci_check_bandwidth() ← POINT OF FAILURE
                     └─ Tries to issue Configure Endpoint
                        └─ But slot_id == 0 or virt_dev == NULL!

PATH2: (faster - xhci slot cleanup)
hub_port_logical_disconnect()
└─ usb_disconnect()
   └─ usb_release_dev()
      └─ usb_hcd_free_dev()
         └─ xhci_free_dev()
            └─ xhci_disable_slot()
               ├─ Issues TRB_DISABLE_SLOT command
               ├─ Waits for completion
               └─ xhci_free_virt_device()
                  ├─ Sets udev->slot_id = 0
                  ├─ Frees virt_dev
                  └─ Sets xhci->devs[slot_id] = NULL

RACE TIMELINE:

Path 2 (fast)
      Path 1 (slow)
─────────────────────────────────────────────────
T1: xhci_free_dev() starts
T2: xhci_disable_slot() issued
T3: slot_id = 0
T4: virt_dev freed
usb_disable_endpoint()
T5: xhci->devs[slot_id] = NULL                             (still processing...)
T6:
     xhci_check_bandwidth() ← RACE!
T7:
     Tries Configure Endpoint
T8:
     But slot is already freed!

Path 1 is slower because it must iterate through all endpoints,
calling usb_disable_endpoint() for each one before reaching
check_bandwidth().
Path 2 completes faster with a single disable slot command. So if
T3-T5 has already executed, meaning tthe slot has already freed then
configure endpoint commands can be skipped i.e T6-T8.
Please let me know if this makes sense ?

Thanks,
-Udipto