lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e47698b-6e6a-45e6-a312-b7db8f473359@kernel.org>
Date: Fri, 9 Jan 2026 09:38:02 -0600
From: Mario Limonciello <superm1@...nel.org>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: "open list:THUNDERBOLT DRIVER" <linux-usb@...r.kernel.org>,
 linux-kernel@...r.kernel.org, Andreas Noever <andreas.noever@...il.com>,
 Yehezkel Bernat <YehezkelShB@...il.com>,
 Pooja Katiyar <pooja.katiyar@...el.com>,
 Rene Sapiens <rene.sapiens@...ux.intel.com>, "S, Sanath" <Sanath.S@....com>
Subject: Re: [PATCH v2 0/2] thunderbolt: Fix S4 resume incongruities

On 1/9/26 1:23 AM, Mika Westerberg wrote:
> On Thu, Jan 08, 2026 at 01:18:58PM -0600, Mario Limonciello wrote:
>> On 1/8/26 5:42 AM, Mika Westerberg wrote:
>>> On Wed, Jan 07, 2026 at 02:50:54PM -0600, Mario Limonciello wrote:
>>>> On 1/7/26 3:33 AM, Mika Westerberg wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, Jan 05, 2026 at 11:37:47PM -0600, Mario Limonciello (AMD) wrote:
>>>>>> When a machine is restored from S4 if the firmware CM has created
>>>>>> tunnels there can be an incongruity of expectation from the kernel
>>>>>> when compared to booting from S5.  This series addresses those.
>>>>>
>>>>> I suspect there is no Firmware CM in AMD platforms so this actually means
>>>>> the BIOS CM, correct?
>>>>
>>>> That's correct.
>>>>
>>>>>
>>>>> However, on S4 we actually do reset host router when the "boot kernel" is
>>>>> started before loading and jumping to the hibernation image.
>>>>
>>>> That's only if thunderbolt.ko is built into the kernel or is included in the
>>>> initramfs before it does the pivot to the hibernation image.
>>>
>>> Ah good point.
>>>
>>>> At least in the tests we were doing it's not part of the boot kernel.
>>>>
>>>>> It might be
>>>>> that this boot kernel tunnel configuration is causing the issues you are
>>>>> seeing (can you elaborate on those?)
>>>>
>>>> The issues manifest "downstream" in the GPU driver.  There are a bunch of
>>>> aux failures and a non functional display.  Tracing it back the GPU driver
>>>> isn't alive at the time that the tunnels are attempted to be reconstructed
>>>> at the moment and so CM tears DP tunnel down and then when GPU driver does
>>>> come up it is not functional.
>>>>
>>>> DP tunnel constructed at:
>>>>
>>>> [  486.007194] thunderbolt 0000:c6:00.6: AUX RX path activation complete
>>>>
>>>> First DPRx timeout at:
>>>>
>>>> [  486.135483] thunderbolt 0000:c6:00.6: 0:6 <-> 2:13 (DP): DPRX read
>>>> timeout
>>>>
>>>> DP tunnel deactivating at:
>>>>
>>>>    [  486.331856] thunderbolt 0000:c6:00.6: 0:6 <-> 2:13 (DP): deactivating
>>>
>>> Hmm, we have dprx_timeout by default 12 seconds. How come it tears down the
>>> tunnel already?
>>
>> *I believe* it's because of a hot unplug event that occurs from it not
>> working.
>>
>>>
>>>>
>>>> First DPRx DPCD reading starts at:
>>>>
>>>> [  486.351765] amdgpu 0000:c4:00.0: amdgpu: [drm] DPIA AUX failed on
>>>> 0xf0000(10), error 7
>>>
>>> This would have maked it within the 12s if I read the timestamps right.
>>
>> Let me just share the whole log so you can see the full context.
>>
>> https://gist.github.com/superm1/6798fff44d0875b4ed0fe43d0794f81e
> 
> Thanks!
> 
> [Side note, you seem to have the link trained at Gen2 (20G) instead of Gen3
> (40G).]
> 
> Looking at the dmesg I recalled that there is an internal report about
> similar issue by Pooja and Rene (Cc'd) and it all boils down to this log
> entry:
> 
> [  489.339148] thunderbolt 0000:c6:00.6: 2:13: could not allocate DP tunnel
> 
> They made a hack patch that works it around, see below. I wonder if you
> could try that too? If that's the issue (not releasing HopIDs) then we need
> to figure a way to fix it properly. 

Thanks! I shared it with our internal team that reproduced it, will come 
back next week with their results.

> One suggestion is to release DP
> resources earlier, and of course doing full reset as done here. I would
> prefer "smallest" possible change.

Well FWIW the v1 of my patch for the reset was a lot smaller :P

https://lore.kernel.org/linux-usb/20251023050354.115015-1-superm1@kernel.org/#t

> 
> @Pooja, any updates on your side to this?
> 
> diff --git a/drivers/thunderbolt/tunnel.c b/drivers/thunderbolt/tunnel.c
> index 28c1e5c062f3..45f7ee940f10 100644
> --- a/drivers/thunderbolt/tunnel.c
> +++ b/drivers/thunderbolt/tunnel.c
> @@ -1084,6 +1084,9 @@ static void tb_dp_dprx_work(struct work_struct *work)
>   
>   static int tb_dp_dprx_start(struct tb_tunnel *tunnel)
>   {
> +	if (tunnel->dprx_started)
> +		return 0;
> +
>   	/*
>   	 * Bump up the reference to keep the tunnel around. It will be
>   	 * dropped in tb_dp_dprx_stop() once the tunnel is deactivated.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ