[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e47698b-6e6a-45e6-a312-b7db8f473359@kernel.org>
Date: Fri, 9 Jan 2026 09:38:02 -0600
From: Mario Limonciello <superm1@...nel.org>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: "open list:THUNDERBOLT DRIVER" <linux-usb@...r.kernel.org>,
linux-kernel@...r.kernel.org, Andreas Noever <andreas.noever@...il.com>,
Yehezkel Bernat <YehezkelShB@...il.com>,
Pooja Katiyar <pooja.katiyar@...el.com>,
Rene Sapiens <rene.sapiens@...ux.intel.com>, "S, Sanath" <Sanath.S@....com>
Subject: Re: [PATCH v2 0/2] thunderbolt: Fix S4 resume incongruities
On 1/9/26 1:23 AM, Mika Westerberg wrote:
> On Thu, Jan 08, 2026 at 01:18:58PM -0600, Mario Limonciello wrote:
>> On 1/8/26 5:42 AM, Mika Westerberg wrote:
>>> On Wed, Jan 07, 2026 at 02:50:54PM -0600, Mario Limonciello wrote:
>>>> On 1/7/26 3:33 AM, Mika Westerberg wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, Jan 05, 2026 at 11:37:47PM -0600, Mario Limonciello (AMD) wrote:
>>>>>> When a machine is restored from S4 if the firmware CM has created
>>>>>> tunnels there can be an incongruity of expectation from the kernel
>>>>>> when compared to booting from S5. This series addresses those.
>>>>>
>>>>> I suspect there is no Firmware CM in AMD platforms so this actually means
>>>>> the BIOS CM, correct?
>>>>
>>>> That's correct.
>>>>
>>>>>
>>>>> However, on S4 we actually do reset host router when the "boot kernel" is
>>>>> started before loading and jumping to the hibernation image.
>>>>
>>>> That's only if thunderbolt.ko is built into the kernel or is included in the
>>>> initramfs before it does the pivot to the hibernation image.
>>>
>>> Ah good point.
>>>
>>>> At least in the tests we were doing it's not part of the boot kernel.
>>>>
>>>>> It might be
>>>>> that this boot kernel tunnel configuration is causing the issues you are
>>>>> seeing (can you elaborate on those?)
>>>>
>>>> The issues manifest "downstream" in the GPU driver. There are a bunch of
>>>> aux failures and a non functional display. Tracing it back the GPU driver
>>>> isn't alive at the time that the tunnels are attempted to be reconstructed
>>>> at the moment and so CM tears DP tunnel down and then when GPU driver does
>>>> come up it is not functional.
>>>>
>>>> DP tunnel constructed at:
>>>>
>>>> [ 486.007194] thunderbolt 0000:c6:00.6: AUX RX path activation complete
>>>>
>>>> First DPRx timeout at:
>>>>
>>>> [ 486.135483] thunderbolt 0000:c6:00.6: 0:6 <-> 2:13 (DP): DPRX read
>>>> timeout
>>>>
>>>> DP tunnel deactivating at:
>>>>
>>>> [ 486.331856] thunderbolt 0000:c6:00.6: 0:6 <-> 2:13 (DP): deactivating
>>>
>>> Hmm, we have dprx_timeout by default 12 seconds. How come it tears down the
>>> tunnel already?
>>
>> *I believe* it's because of a hot unplug event that occurs from it not
>> working.
>>
>>>
>>>>
>>>> First DPRx DPCD reading starts at:
>>>>
>>>> [ 486.351765] amdgpu 0000:c4:00.0: amdgpu: [drm] DPIA AUX failed on
>>>> 0xf0000(10), error 7
>>>
>>> This would have maked it within the 12s if I read the timestamps right.
>>
>> Let me just share the whole log so you can see the full context.
>>
>> https://gist.github.com/superm1/6798fff44d0875b4ed0fe43d0794f81e
>
> Thanks!
>
> [Side note, you seem to have the link trained at Gen2 (20G) instead of Gen3
> (40G).]
>
> Looking at the dmesg I recalled that there is an internal report about
> similar issue by Pooja and Rene (Cc'd) and it all boils down to this log
> entry:
>
> [ 489.339148] thunderbolt 0000:c6:00.6: 2:13: could not allocate DP tunnel
>
> They made a hack patch that works it around, see below. I wonder if you
> could try that too? If that's the issue (not releasing HopIDs) then we need
> to figure a way to fix it properly.
Thanks! I shared it with our internal team that reproduced it, will come
back next week with their results.
> One suggestion is to release DP
> resources earlier, and of course doing full reset as done here. I would
> prefer "smallest" possible change.
Well FWIW the v1 of my patch for the reset was a lot smaller :P
https://lore.kernel.org/linux-usb/20251023050354.115015-1-superm1@kernel.org/#t
>
> @Pooja, any updates on your side to this?
>
> diff --git a/drivers/thunderbolt/tunnel.c b/drivers/thunderbolt/tunnel.c
> index 28c1e5c062f3..45f7ee940f10 100644
> --- a/drivers/thunderbolt/tunnel.c
> +++ b/drivers/thunderbolt/tunnel.c
> @@ -1084,6 +1084,9 @@ static void tb_dp_dprx_work(struct work_struct *work)
>
> static int tb_dp_dprx_start(struct tb_tunnel *tunnel)
> {
> + if (tunnel->dprx_started)
> + return 0;
> +
> /*
> * Bump up the reference to keep the tunnel around. It will be
> * dropped in tb_dp_dprx_stop() once the tunnel is deactivated.
Powered by blists - more mailing lists