linux-kernel - Re: [PATCH v4 1/1] xhci: Correctly handle last TRB of isoc TD on Etron xHCI host

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b19218ab-5248-47ba-8111-157818415247@linux.intel.com>
Date: Fri, 7 Feb 2025 14:06:54 +0200
From: Mathias Nyman <mathias.nyman@...ux.intel.com>
To: Michał Pecio <michal.pecio@...il.com>
Cc: gregkh@...uxfoundation.org, ki.chiang65@...il.com,
 linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org,
 mathias.nyman@...el.com, stable@...r.kernel.org
Subject: Re: [PATCH v4 1/1] xhci: Correctly handle last TRB of isoc TD on
 Etron xHCI host

On 6.2.2025 0.42, Michał Pecio wrote:
>> Not giving back the TD when we get an event for the last TRB in the
>> TD sounds risky. With this change we assume all old and future ETRON
>> hosts will trigger this additional spurious success event.
> 
> error_mid_td can cope with hosts which don't produce the extra success
> event, it was done this way to deal with buggy NECs. The cost is one
> more ESIT of latency on TDs with error.

It makes giving back the TD depend on a future event we can't guarantee.

I still think it better fits the spurious success case.
It's not an error mid TD, it's a spurious success event sent by host
after a completion (error) event for the last TRB in the TD.

Making this change to error_mid_td code also makes that code more
confusing and harder to follow.

>> I think we could handle this more like the XHCI_SPURIOUS_SUCCESS case
>> seen with short transfers, and just silence the error message.
> 
> That's a little dodgy because it frees the TD before the HC is
> completely done with it. *Probably* no problem with data buffers
> (no sensible reason to DMA into them after an earlier error), but
> we could overwrite the transfer ring in rare cases and IDK if it
> would or wouldn't cause problems in this particular case.

We did get an event for the last TRB in the TD, so in normal cases
this TD should be considered complete, and given back.

I don't think the controller has any reason to touch data buffers at
this stage either. Can't recall any iommu/dma issues related to this.

> 
> Same applies to the "short packet" case existing today. I thought
> about fixing it, but IIRC I ran into some differences between HCs
> or out of spec behavior and it got tricky.

For the short transfer case this is more valid concern. Here we give
back the TD after an event mid TD, and we know hardware might still
walk the rest of the TD. It shouldn't touch data buffers either as
short transfer indicates all data has been written.

> 
> Maybe it would make sense to separate giveback (and freeing of the
> data buffer by class drivers) from transfer ring inc_deq(). Do the
> former when we reasonably believe the HC won't touch the buffers
> anymore, do the latter when we are sure that it's in the next TD.

This sounds reasonable, makes sense to keep the software dequeue
pointer where hardware last reported its position. Currently we
advance it to where we assume hardware will be next.

But this is a separate project.
Might need some work around in the driver.

Thanks
Mathias