lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <049c4850-fdb5-78fb-1d5e-0850dcd062aa@linux.intel.com>
Date: Thu, 11 Apr 2024 10:18:07 +0300
From: Mathias Nyman <mathias.nyman@...ux.intel.com>
To: Paul Menzel <pmenzel@...gen.mpg.de>, Michał Pecio
 <michal.pecio@...il.com>
Cc: Mathias Nyman <mathias.nyman@...el.com>,
 LKML <linux-kernel@...r.kernel.org>, linux-usb@...r.kernel.org,
 Niklas Neronin <niklas.neronin@...ux.intel.com>
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part
 of current TD ep_index 1 comp_code 1

On 10.4.2024 10.59, Paul Menzel wrote:
> Dear Mathias, dear Michał,
> 
> 
> Am 09.04.24 um 13:22 schrieb Mathias Nyman:
>> On 8.4.2024 22.05, Michał Pecio wrote:
>>>> It's also possible this TD/TRB was cancelled due to the disconnect.
>>>> Could be that even if driver removes the TD from the list and cleans
>>>> out the TRB from the ring buffer (turns TRB to no-op) hardware may
>>>> have read ahead and cached the TRB, and process it anyway.
>>>
>>> I thought about it, but my debug patch says that the missing TD was
>>> freed by finish_td(), which is called on TDs considered completed by
>>> hardware. A cancelled TD would show giveback_invalidated_tds().
>>>
>>> Anyway, we now have new information from the reporter. My v2 patch
>>> keeps a log of the last five events processed on each transfer ring
>>> and dumps the log on TRB mismatch errors.
>>>
>>> Unfortunately, it looks like the host controller is broken and signals
>>> completion of those transfers twice. The log below shows two distinct
>>> events for TRB 32959a1c0 and that the coresponding TD has just been
>>> freed by finish_td().
>>
>> The trace confirms this, we get double completion events for several
>> Isoc TRBs. These double completions are seen after a transaction
>> error on the same device (different endpoint). >
>> Transfer events for TRB ..a1c0 twice, with a transaction error in between:
>>   <idle>-0       [000] d.h2. 33819.709897: xhci_handle_event: EVENT: TRB 000000032959a1c0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>   <idle>-0       [000] d.h2. 33819.709904: xhci_handle_event: EVENT: TRB 000000041547d010 status 'USB Transaction Error' len 4 slot 6 ep 15 type 'Transfer Event' flags e:c
>>   systemd-journal-395     [000] d.H1. 33819.711886: xhci_handle_event: EVENT: TRB 000000032959a1c0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>
>> Transfer events for TRB ..a1d0 twice (the next TRB)
>>   systemd-journal-395     [000] d.H1. 33819.712001: xhci_handle_event: EVENT: TRB 000000032959a1d0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>   systemd-journal-395     [000] d.H1. 33819.712059: xhci_handle_event: EVENT: TRB 000000032959a1d0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>
>> Transfer events for TRB ..a1e0 twice
>>   systemd-journal-395     [000] d.H1. 33819.712139: xhci_handle_event: EVENT: TRB 000000032959a1e0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>   systemd-journal-395     [000] d.h1. 33819.712871: xhci_handle_event: EVENT: TRB 000000032959a1e0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>
>> etc..
>>
>> Driver can cope with these extra events, but if this is common we should
>> probably handle it silently and not concern users with that ERROR message.
> 
> Thank you for the detailed analysis. Excuse my ignorance, but do you have an idea, what this Sennheiser USB headset does differently than other USB devices? Additionally, is this a known problem with this Intel xHCI controller, meaning, is there an errata about this problem?

There are a few related erratas in older 9 series chipsets that possibly
could explain this, but those issues are no longer listed for newer chipsets.

The Sennheiser headset is a full-speed (FS) device that use 192 byte Isoch transfers.

Series 9 chipset xHC has issues with exactly those FS Isoch transfers over 189 bytes, see
" 1. USB Isoch In Transfer Error Issue"

There are some issues related to FS device removal:
" 13. USB Full-/low-speed Device Removal Issue"

And some related to resending transfer events for "chached" TRBs after FS device
disconnect/reconnect.
"25. USB xHCI may Execute a Stale Transfer Request Block (TRB)"

https://www.intel.co.jp/content/dam/www/public/us/en/documents/specification-updates/9-series-chipset-pch-spec-update.pdf

Thanks
Mathias


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ