linux-kernel - Re: [PATCH] usb: xhci: Assume that endpoints halt as specified

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20251114133231.3f187b94.michal.pecio@gmail.com>
Date: Fri, 14 Nov 2025 13:32:31 +0100
From: Michal Pecio <michal.pecio@...il.com>
To: Mathias Nyman <mathias.nyman@...ux.intel.com>
Cc: Mathias Nyman <mathias.nyman@...el.com>, Greg Kroah-Hartman
 <gregkh@...uxfoundation.org>, linux-usb@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] usb: xhci: Assume that endpoints halt as specified

On Tue, 11 Nov 2025 14:13:05 +0200, Mathias Nyman wrote:
> Makes sense, I guess we can only trust hardware to update the state in
> the endpoint context on specific command completions, not transfer events.

Technically, 4.8.3 requires HW to update to Running before writing any
transfer event to the event ring. It says nothing about Halted, though
4.10.2.1 appears to imply similar ordering in case of Stall Error.

But then 4.8.3 explicitly says

  The update of EP State may also be delayed relative to a Doorbell
  ring or error condition (e.g. TRB Error, STALL, or USB Transaction
  Error) that causes an EP State change not generated by a command. 

so the spec is a self-contradictory mess as usual. My hope with this
patch is that maybe other SW vendors follow 4.8.3 recommendation and
HW gets tested to work under such conditions.

The Promontory problem is not even a delay, it's a complete failure.
I added a loop which waits for GET_EP_CTX_STATE(READ_ONCE(ep_ctx)) to
become HALTED and it was still RUNNING after 1.5 second.

I guess it's some stinking internal race condition again, maybe it
halts too quickly after restart and then a delayed update to Running
overwrites the Halted state update. Or something that only happens
if we restart too quickly after previous error. IIRC, it was never
happening the first time the endpoint halts after loss of connection,
only randomly later after some resets.

Regards,
Michal