[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260109072318.GU2275908@black.igk.intel.com>
Date: Fri, 9 Jan 2026 08:23:18 +0100
From: Mika Westerberg <mika.westerberg@...ux.intel.com>
To: Mario Limonciello <superm1@...nel.org>
Cc: "open list:THUNDERBOLT DRIVER" <linux-usb@...r.kernel.org>,
linux-kernel@...r.kernel.org,
Andreas Noever <andreas.noever@...il.com>,
Yehezkel Bernat <YehezkelShB@...il.com>,
Pooja Katiyar <pooja.katiyar@...el.com>,
Rene Sapiens <rene.sapiens@...ux.intel.com>
Subject: Re: [PATCH v2 0/2] thunderbolt: Fix S4 resume incongruities
On Thu, Jan 08, 2026 at 01:18:58PM -0600, Mario Limonciello wrote:
> On 1/8/26 5:42 AM, Mika Westerberg wrote:
> > On Wed, Jan 07, 2026 at 02:50:54PM -0600, Mario Limonciello wrote:
> > > On 1/7/26 3:33 AM, Mika Westerberg wrote:
> > > > Hi,
> > > >
> > > > On Mon, Jan 05, 2026 at 11:37:47PM -0600, Mario Limonciello (AMD) wrote:
> > > > > When a machine is restored from S4 if the firmware CM has created
> > > > > tunnels there can be an incongruity of expectation from the kernel
> > > > > when compared to booting from S5. This series addresses those.
> > > >
> > > > I suspect there is no Firmware CM in AMD platforms so this actually means
> > > > the BIOS CM, correct?
> > >
> > > That's correct.
> > >
> > > >
> > > > However, on S4 we actually do reset host router when the "boot kernel" is
> > > > started before loading and jumping to the hibernation image.
> > >
> > > That's only if thunderbolt.ko is built into the kernel or is included in the
> > > initramfs before it does the pivot to the hibernation image.
> >
> > Ah good point.
> >
> > > At least in the tests we were doing it's not part of the boot kernel.
> > >
> > > > It might be
> > > > that this boot kernel tunnel configuration is causing the issues you are
> > > > seeing (can you elaborate on those?)
> > >
> > > The issues manifest "downstream" in the GPU driver. There are a bunch of
> > > aux failures and a non functional display. Tracing it back the GPU driver
> > > isn't alive at the time that the tunnels are attempted to be reconstructed
> > > at the moment and so CM tears DP tunnel down and then when GPU driver does
> > > come up it is not functional.
> > >
> > > DP tunnel constructed at:
> > >
> > > [ 486.007194] thunderbolt 0000:c6:00.6: AUX RX path activation complete
> > >
> > > First DPRx timeout at:
> > >
> > > [ 486.135483] thunderbolt 0000:c6:00.6: 0:6 <-> 2:13 (DP): DPRX read
> > > timeout
> > >
> > > DP tunnel deactivating at:
> > >
> > > [ 486.331856] thunderbolt 0000:c6:00.6: 0:6 <-> 2:13 (DP): deactivating
> >
> > Hmm, we have dprx_timeout by default 12 seconds. How come it tears down the
> > tunnel already?
>
> *I believe* it's because of a hot unplug event that occurs from it not
> working.
>
> >
> > >
> > > First DPRx DPCD reading starts at:
> > >
> > > [ 486.351765] amdgpu 0000:c4:00.0: amdgpu: [drm] DPIA AUX failed on
> > > 0xf0000(10), error 7
> >
> > This would have maked it within the 12s if I read the timestamps right.
>
> Let me just share the whole log so you can see the full context.
>
> https://gist.github.com/superm1/6798fff44d0875b4ed0fe43d0794f81e
Thanks!
[Side note, you seem to have the link trained at Gen2 (20G) instead of Gen3
(40G).]
Looking at the dmesg I recalled that there is an internal report about
similar issue by Pooja and Rene (Cc'd) and it all boils down to this log
entry:
[ 489.339148] thunderbolt 0000:c6:00.6: 2:13: could not allocate DP tunnel
They made a hack patch that works it around, see below. I wonder if you
could try that too? If that's the issue (not releasing HopIDs) then we need
to figure a way to fix it properly. One suggestion is to release DP
resources earlier, and of course doing full reset as done here. I would
prefer "smallest" possible change.
@Pooja, any updates on your side to this?
diff --git a/drivers/thunderbolt/tunnel.c b/drivers/thunderbolt/tunnel.c
index 28c1e5c062f3..45f7ee940f10 100644
--- a/drivers/thunderbolt/tunnel.c
+++ b/drivers/thunderbolt/tunnel.c
@@ -1084,6 +1084,9 @@ static void tb_dp_dprx_work(struct work_struct *work)
static int tb_dp_dprx_start(struct tb_tunnel *tunnel)
{
+ if (tunnel->dprx_started)
+ return 0;
+
/*
* Bump up the reference to keep the tunnel around. It will be
* dropped in tb_dp_dprx_stop() once the tunnel is deactivated.
--
2.43.0
Powered by blists - more mailing lists