lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8cf57879-5fa7-4d23-afb4-6ef99f0ce97a@kernel.org>
Date: Tue, 13 Jan 2026 12:44:18 -0600
From: "Mario Limonciello (AMD) (kernel.org)" <superm1@...nel.org>
To: "Katiyar, Pooja" <pooja.katiyar@...ux.intel.com>,
 Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: "open list:THUNDERBOLT DRIVER" <linux-usb@...r.kernel.org>,
 linux-kernel@...r.kernel.org, Andreas Noever <andreas.noever@...il.com>,
 Yehezkel Bernat <YehezkelShB@...il.com>,
 Pooja Katiyar <pooja.katiyar@...el.com>,
 Rene Sapiens <rene.sapiens@...ux.intel.com>
Subject: Re: [PATCH v2 0/2] thunderbolt: Fix S4 resume incongruities



On 1/9/2026 6:42 PM, Katiyar, Pooja wrote:
> Hi,
> 
> On Thu, Jan 8, 2026 at 11:23:18PM -0800, Mika Westerberg wrote:
>> On Thu, Jan 08, 2026 at 01:18:58PM -0600, Mario Limonciello wrote:
>>> On 1/8/26 5:42 AM, Mika Westerberg wrote:
>>>
>>> Let me just share the whole log so you can see the full context.
>>>
>>> https://gist.github.com/superm1/6798fff44d0875b4ed0fe43d0794f81e
>>
>> Thanks!
>>
>> [Side note, you seem to have the link trained at Gen2 (20G) instead of Gen3
>> (40G).]
>>
>> Looking at the dmesg I recalled that there is an internal report about
>> similar issue by Pooja and Rene (Cc'd) and it all boils down to this log
>> entry:
>>
>> [  489.339148] thunderbolt 0000:c6:00.6: 2:13: could not allocate DP tunnel
>>
>> They made a hack patch that works it around, see below. I wonder if you
>> could try that too? If that's the issue (not releasing HopIDs) then we need
>> to figure a way to fix it properly. One suggestion is to release DP
>> resources earlier, and of course doing full reset as done here. I would
>> prefer "smallest" possible change.
>>
>> @Pooja, any updates on your side to this?
> 
> Looking at the log "could not allocate DP tunnel", this appears to be
> similar to kref synchronization issue during S4 resume that we are
> facing. The problem we have identified is during S4 entry, hibernation
> image is created first, and then the DP tunnels are freed. This means
> the hibernation image still contains the tunnels in their active state.
> However, when resuming from S4, the tunnels are restored from the
> hibernation image (as active) and then the resume flow reactivates
> them again, causing kref count mismatch. This leads to HopID allocation
> conflicts and the "could not allocate DP tunnel" error on next
> connect/tunnel activation.
> 
> The hack patch works around this by preventing double activation via
> dprx_started flag. You could try this hack to confirm if it's the same
> issue we're dealing with.
> 
> For a proper fix, we are working on a patch which releases the DP resources
> before saving the hibernation image and creates them again during resume,
> managing the resources properly. The patch is currently under review and
> testing and will send shortly.
> 
> 

I have confirmation the hack patch does help the issue for us too.

If your patch doesn't work another logical solution could be to destroy 
all the tunnels as part of the PM freeze callback (not just the DP 
resources).  Maybe even unify the suspend and freeze codepaths for more 
opportunities for code reuse?

>>
>> diff --git a/drivers/thunderbolt/tunnel.c b/drivers/thunderbolt/tunnel.c
>> index 28c1e5c062f3..45f7ee940f10 100644
>> --- a/drivers/thunderbolt/tunnel.c
>> +++ b/drivers/thunderbolt/tunnel.c
>> @@ -1084,6 +1084,9 @@ static void tb_dp_dprx_work(struct work_struct *work)
>>   
>>   static int tb_dp_dprx_start(struct tb_tunnel *tunnel)
>>   {
>> +	if (tunnel->dprx_started)
>> +		return 0;
>> +
>>   	/*
>>   	 * Bump up the reference to keep the tunnel around. It will be
>>   	 * dropped in tb_dp_dprx_stop() once the tunnel is deactivated.
> 
> Thanks,
> Pooja


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ