[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bf5e1667-6052-4207-a2b3-e784f9b49d44@molgen.mpg.de>
Date: Thu, 11 Apr 2024 21:55:04 +0200
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Mathias Nyman <mathias.nyman@...ux.intel.com>
Cc: Michał Pecio <michal.pecio@...il.com>,
Mathias Nyman <mathias.nyman@...el.com>, LKML
<linux-kernel@...r.kernel.org>, linux-usb@...r.kernel.org,
Niklas Neronin <niklas.neronin@...ux.intel.com>
Subject: Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part
of current TD ep_index 1 comp_code 1
Dear Mathias,
Thank you for your reply.
Am 11.04.24 um 09:18 schrieb Mathias Nyman:
> On 10.4.2024 10.59, Paul Menzel wrote:
>> Am 09.04.24 um 13:22 schrieb Mathias Nyman:
>>> On 8.4.2024 22.05, Michał Pecio wrote:
>>>>> It's also possible this TD/TRB was cancelled due to the disconnect.
>>>>> Could be that even if driver removes the TD from the list and cleans
>>>>> out the TRB from the ring buffer (turns TRB to no-op) hardware may
>>>>> have read ahead and cached the TRB, and process it anyway.
>>>>
>>>> I thought about it, but my debug patch says that the missing TD was
>>>> freed by finish_td(), which is called on TDs considered completed by
>>>> hardware. A cancelled TD would show giveback_invalidated_tds().
>>>>
>>>> Anyway, we now have new information from the reporter. My v2 patch
>>>> keeps a log of the last five events processed on each transfer ring
>>>> and dumps the log on TRB mismatch errors.
>>>>
>>>> Unfortunately, it looks like the host controller is broken and signals
>>>> completion of those transfers twice. The log below shows two distinct
>>>> events for TRB 32959a1c0 and that the coresponding TD has just been
>>>> freed by finish_td().
>>>
>>> The trace confirms this, we get double completion events for several
>>> Isoc TRBs. These double completions are seen after a transaction
>>> error on the same device (different endpoint). >
>>> Transfer events for TRB ..a1c0 twice, with a transaction error in
>>> between:
>>> <idle>-0 [000] d.h2. 33819.709897: xhci_handle_event: EVENT: TRB 000000032959a1c0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>> <idle>-0 [000] d.h2. 33819.709904: xhci_handle_event: EVENT: TRB 000000041547d010 status 'USB Transaction Error' len 4 slot 6 ep 15 type 'Transfer Event' flags e:c
>>> systemd-journal-395 [000] d.H1. 33819.711886: xhci_handle_event: EVENT: TRB 000000032959a1c0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>>
>>> Transfer events for TRB ..a1d0 twice (the next TRB)
>>> systemd-journal-395 [000] d.H1. 33819.712001: xhci_handle_event: EVENT: TRB 000000032959a1d0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>> systemd-journal-395 [000] d.H1. 33819.712059: xhci_handle_event: EVENT: TRB 000000032959a1d0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>>
>>> Transfer events for TRB ..a1e0 twice
>>> systemd-journal-395 [000] d.H1. 33819.712139: xhci_handle_event: EVENT: TRB 000000032959a1e0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>> systemd-journal-395 [000] d.h1. 33819.712871: xhci_handle_event: EVENT: TRB 000000032959a1e0 status 'Success' len 0 slot 6 ep 2 type 'Transfer Event' flags e:c
>>>
>>> etc..
>>>
>>> Driver can cope with these extra events, but if this is common we should
>>> probably handle it silently and not concern users with that ERROR
>>> message.
>>
>> Thank you for the detailed analysis. Excuse my ignorance, but do you
>> have an idea, what this Sennheiser USB headset does differently than
>> other USB devices? Additionally, is this a known problem with this
>> Intel xHCI controller, meaning, is there an errata about this problem?
>
> There are a few related erratas in older 9 series chipsets that possibly
> could explain this, but those issues are no longer listed for newer
> chipsets.
>
> The Sennheiser headset is a full-speed (FS) device that use 192 byte
> Isoch transfers.
>
> Series 9 chipset xHC has issues with exactly those FS Isoch transfers
> over 189 bytes, see
> " 1. USB Isoch In Transfer Error Issue"
>
> There are some issues related to FS device removal:
> " 13. USB Full-/low-speed Device Removal Issue"
>
> And some related to resending transfer events for "cached" TRBs after
> FS device disconnect/reconnect.
> "25. USB xHCI may Execute a Stale Transfer Request Block (TRB)"
>
> https://www.intel.co.jp/content/dam/www/public/us/en/documents/specification-updates/9-series-chipset-pch-spec-update.pdf
Thank you for digging this up. Judging from the document, these erratas
were not addressed in any firmware update.
As another data point, I was able to reproduce this issue with the
Sennheiser USB headset and a Dell XPS 15 7590.
$ lspci -nn | grep USB
00:14.0 USB controller [0c03]: Intel Corporation Cannon Lake PCH
USB 3.1 xHCI Host Controller [8086:a36d] (rev 10)
3a:00.0 USB controller [0c03]: Intel Corporation JHL6340
Thunderbolt 3 USB 3.1 Controller (C step) [Alpine Ridge 2C 2016]
[8086:15db] (rev 02)
I uploaded the logs to *Hardware for Linux* [1].
Kind regards,
Paul
[1]: https://linux-hardware.org/?probe=904c918345
Powered by blists - more mailing lists