lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251114133231.3f187b94.michal.pecio@gmail.com>
Date: Fri, 14 Nov 2025 13:32:31 +0100
From: Michal Pecio <michal.pecio@...il.com>
To: Mathias Nyman <mathias.nyman@...ux.intel.com>
Cc: Mathias Nyman <mathias.nyman@...el.com>, Greg Kroah-Hartman
 <gregkh@...uxfoundation.org>, linux-usb@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] usb: xhci: Assume that endpoints halt as specified

On Tue, 11 Nov 2025 14:13:05 +0200, Mathias Nyman wrote:
> Makes sense, I guess we can only trust hardware to update the state in
> the endpoint context on specific command completions, not transfer events.

Technically, 4.8.3 requires HW to update to Running before writing any
transfer event to the event ring. It says nothing about Halted, though
4.10.2.1 appears to imply similar ordering in case of Stall Error.

But then 4.8.3 explicitly says

  The update of EP State may also be delayed relative to a Doorbell
  ring or error condition (e.g. TRB Error, STALL, or USB Transaction
  Error) that causes an EP State change not generated by a command. 

so the spec is a self-contradictory mess as usual. My hope with this
patch is that maybe other SW vendors follow 4.8.3 recommendation and
HW gets tested to work under such conditions.

The Promontory problem is not even a delay, it's a complete failure.
I added a loop which waits for GET_EP_CTX_STATE(READ_ONCE(ep_ctx)) to
become HALTED and it was still RUNNING after 1.5 second.

I guess it's some stinking internal race condition again, maybe it
halts too quickly after restart and then a delayed update to Running
overwrites the Halted state update. Or something that only happens
if we restart too quickly after previous error. IIRC, it was never
happening the first time the endpoint halts after loss of connection,
only randomly later after some resets.

Regards,
Michal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ