lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 11 Apr 2024 21:55:04 +0200
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Mathias Nyman <mathias.nyman@...ux.intel.com>
Cc: Michał Pecio <michal.pecio@...il.com>,
 Mathias Nyman <mathias.nyman@...el.com>, LKML
 <linux-kernel@...r.kernel.org>, linux-usb@...r.kernel.org,
 Niklas Neronin <niklas.neronin@...ux.intel.com>
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part
 of current TD ep_index 1 comp_code 1

Dear Mathias,


Thank you for your reply.

Am 11.04.24 um 09:18 schrieb Mathias Nyman:
> On 10.4.2024 10.59, Paul Menzel wrote:

>> Am 09.04.24 um 13:22 schrieb Mathias Nyman:
>>> On 8.4.2024 22.05, Michał Pecio wrote:
>>>>> It's also possible this TD/TRB was cancelled due to the disconnect.
>>>>> Could be that even if driver removes the TD from the list and cleans
>>>>> out the TRB from the ring buffer (turns TRB to no-op) hardware may
>>>>> have read ahead and cached the TRB, and process it anyway.
>>>>
>>>> I thought about it, but my debug patch says that the missing TD was
>>>> freed by finish_td(), which is called on TDs considered completed by
>>>> hardware. A cancelled TD would show giveback_invalidated_tds().
>>>>
>>>> Anyway, we now have new information from the reporter. My v2 patch
>>>> keeps a log of the last five events processed on each transfer ring
>>>> and dumps the log on TRB mismatch errors.
>>>>
>>>> Unfortunately, it looks like the host controller is broken and signals
>>>> completion of those transfers twice. The log below shows two distinct
>>>> events for TRB 32959a1c0 and that the coresponding TD has just been
>>>> freed by finish_td().
>>>
>>> The trace confirms this, we get double completion events for several
>>> Isoc TRBs. These double completions are seen after a transaction
>>> error on the same device (different endpoint). >
>>> Transfer events for TRB ..a1c0 twice, with a transaction error in 
>>> between:
>>>   <idle>-0       [000] d.h2. 33819.709897: xhci_handle_event: EVENT: TRB 000000032959a1c0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>>   <idle>-0       [000] d.h2. 33819.709904: xhci_handle_event: EVENT: TRB 000000041547d010 status 'USB Transaction Error' len 4 slot 6 ep 15 type 'Transfer Event' flags e:c
>>>   systemd-journal-395     [000] d.H1. 33819.711886: xhci_handle_event: EVENT: TRB 000000032959a1c0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>>
>>> Transfer events for TRB ..a1d0 twice (the next TRB)
>>>   systemd-journal-395     [000] d.H1. 33819.712001: xhci_handle_event: EVENT: TRB 000000032959a1d0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>>   systemd-journal-395     [000] d.H1. 33819.712059: xhci_handle_event: EVENT: TRB 000000032959a1d0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>>
>>> Transfer events for TRB ..a1e0 twice
>>>   systemd-journal-395     [000] d.H1. 33819.712139: xhci_handle_event: EVENT: TRB 000000032959a1e0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>>   systemd-journal-395     [000] d.h1. 33819.712871: xhci_handle_event: EVENT: TRB 000000032959a1e0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>>
>>> etc..
>>>
>>> Driver can cope with these extra events, but if this is common we should
>>> probably handle it silently and not concern users with that ERROR 
>>> message.
>>
>> Thank you for the detailed analysis. Excuse my ignorance, but do you 
>> have an idea, what this Sennheiser USB headset does differently than 
>> other USB devices? Additionally, is this a known problem with this 
>> Intel xHCI controller, meaning, is there an errata about this problem?
> 
> There are a few related erratas in older 9 series chipsets that possibly
> could explain this, but those issues are no longer listed for newer 
> chipsets.
> 
> The Sennheiser headset is a full-speed (FS) device that use 192 byte 
> Isoch transfers.
> 
> Series 9 chipset xHC has issues with exactly those FS Isoch transfers 
> over 189 bytes, see
> " 1. USB Isoch In Transfer Error Issue"
> 
> There are some issues related to FS device removal:
> " 13. USB Full-/low-speed Device Removal Issue"
> 
> And some related to resending transfer events for "cached" TRBs after
> FS device disconnect/reconnect.
> "25. USB xHCI may Execute a Stale Transfer Request Block (TRB)"
> 
> https://www.intel.co.jp/content/dam/www/public/us/en/documents/specification-updates/9-series-chipset-pch-spec-update.pdf

Thank you for digging this up. Judging from the document, these erratas 
were not addressed in any firmware update.

As another data point, I was able to reproduce this issue with the 
Sennheiser USB headset and a Dell XPS 15 7590.

     $ lspci -nn | grep USB
     00:14.0 USB controller [0c03]: Intel Corporation Cannon Lake PCH 
USB 3.1 xHCI Host Controller [8086:a36d] (rev 10)
     3a:00.0 USB controller [0c03]: Intel Corporation JHL6340 
Thunderbolt 3 USB 3.1 Controller (C step) [Alpine Ridge 2C 2016] 
[8086:15db] (rev 02)

I uploaded the logs to *Hardware for Linux* [1].


Kind regards,

Paul


[1]: https://linux-hardware.org/?probe=904c918345

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ