[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251209054141.1975982-1-acelan.kao@canonical.com>
Date: Tue, 9 Dec 2025 13:41:41 +0800
From: "Chia-Lin Kao (AceLan)" <acelan.kao@...onical.com>
To: Andreas Noever <andreas.noever@...il.com>,
Mika Westerberg <westeri@...nel.org>,
Yehezkel Bernat <YehezkelShB@...il.com>,
linux-usb@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [PATCH] [RFC] thunderbolt: Add delay for Dell U2725QE link width
When plugging in a Dell U2725QE Thunderbolt monitor, the kernel produces
a call trace during initial enumeration. The device automatically
disconnects and reconnects ~3 seconds later, and works correctly on the
second attempt.
Issue Description:
==================
The Dell U2725QE (USB4 device 8087:b26) requires additional time during
link width negotiation from single lane to dual lane. On first plug, the
following sequence occurs:
1. Port state reaches TB_PORT_UP (link established, single lane)
2. Path activation begins immediately
3. tb_path_activate() - > tb_port_write() returns -ENOTCONN (error -107)
4. Call trace is generated at tb_path_activate()
5. Device disconnects/reconnects automatically after ~3 seconds
6. Second attempt succeeds with full dual-lane bandwidth
First attempt dmesg (failure):
-------------------------------
[ 36.030347] thunderbolt 0000:c7:00.6: 2:16: available bandwidth for new USB3 tunnel 9000/9000 Mb/s
[ 36.030613] thunderbolt 0000:c7:00.6: 2: USB3 tunnel creation failed
[ 36.031530] thunderbolt 0000:c7:00.6: PCIe Down path activation failed
[ 36.031531] WARNING: drivers/thunderbolt/path.c:589 at 0x0, CPU#12: pool-/usr/libex/3145
Second attempt dmesg (success):
--------------------------------
[ 40.440012] thunderbolt 0000:c7:00.6: 2:16: available bandwidth for new USB3 tunnel 36000/36000 Mb/s
[ 40.440261] thunderbolt 0000:c7:00.6: 2:16: maximum required bandwidth for USB3 tunnel 9000 Mb/s
[ 40.440269] thunderbolt 0000:c7:00.6: 0:4 <-> 2:16 (USB3): activating
[ 40.440271] thunderbolt 0000:c7:00.6: 0:4 <-> 2:16 (USB3): allocating initial bandwidth 9000/9000 Mb/s
The bandwidth difference (9000 vs 36000 Mb/s) indicates the first attempt
occurs while the link is still in single-lane mode.
Root Cause Analysis:
====================
The error originates from the Thunderbolt/USB4 device hardware itself:
1. Port config space read/write returns TB_CFG_ERROR_PORT_NOT_CONNECTED
2. This gets translated to -ENOTCONN in tb_cfg_get_error()
3. The port's control channel is temporarily unavailable during state
transition from single lane to dual lane (lane bonding)
The comment in drivers/thunderbolt/ctl.c explains this is expected:
"Port is not connected. This can happen during surprise removal.
Do not warn."
Attempted Solutions:
====================
1. Retry logic on -ENOTCONN in tb_path_activate():
Result: Caused host port (0:0) lockup with hundreds of "downstream
port is locked" errors. Rejected by user.
2. Increased tb_port_wait_for_link_width() timeout from 100ms to 3000ms:
Result: Did not resolve the issue. The timeout increase alone is
insufficient because the port state hasn't reached TB_PORT_UP when
lane bonding is attempted.
3. Added msleep(2000) at various points in enumeration flow:
Locations tested:
- Before tb_switch_configure(): Works ✓
- Before tb_switch_add(): Works ✓
- Before usb4_port_hotplug_enable(): Works ✓
- After tb_switch_add(): Doesn't work ✗
- In tb_configure_link(): Doesn't work ✗
- In tb_switch_lane_bonding_enable(): Doesn't work ✗
- In tb_port_wait_for_link_width(): Doesn't work ✗
The pattern shows the delay must occur BEFORE hotplug enable, which
happens early in tb_switch_port_hotplug_enable() -> usb4_port_hotplug_enable().
Current Workaround:
===================
Add a 2-second delay in tb_wait_for_port() when the port state reaches
TB_PORT_UP. This is the earliest point where we know:
- The link is physically established
- The device is responsive
- But lane width negotiation may still be in progress
This location is chosen because:
1. It's called during port enumeration before any tunnel creation
2. The port has just transitioned to TB_PORT_UP state
3. Allows sufficient time for lane bonding to complete
4. Avoids affecting other code paths
Testing Results:
================
With this patch:
- No call trace on first plug
- Device enumerates correctly on first attempt
- Full bandwidth (36000 Mb/s) available immediately
- No disconnect/reconnect cycle
- USB and PCIe tunnels create successfully
Without this patch:
- Call trace on every first plug
- Only 9000 Mb/s bandwidth (single lane) on first attempt
- Automatic disconnect/reconnect after ~3 seconds
- Second attempt works with 36000 Mb/s
Discussion Points for RFC:
===========================
1. Is a fixed 2-second delay acceptable, or should we poll for a
specific hardware state?
2. Should we check PORT_CS_18_TIP (Transition In Progress) bit instead
of using a fixed delay?
3. Is there a better location for this delay in the enumeration flow?
4. Should this be device-specific (based on vendor/device ID) or apply
to all USB4 devices?
5. The 100ms timeout in tb_switch_lane_bonding_enable() may be too
short for other devices as well. Should we increase it universally?
Hardware Details:
=================
Device: Dell U2725QE Thunderbolt Monitor
USB4 Router: 8087:b26 (Intel USB4 controller)
Host: AMD Thunderbolt 4 controller (0000:c7:00.6)
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@...onical.com>
---
Full dmesg log available at: https://paste.ubuntu.com/p/CXs2T4XzZ3/
---
drivers/thunderbolt/switch.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index b3948aad0b955..e0c65e5fb0dca 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -530,6 +530,8 @@ int tb_wait_for_port(struct tb_port *port, bool wait_if_unplugged)
return 0;
case TB_PORT_UP:
+ msleep(2000);
+ fallthrough;
case TB_PORT_TX_CL0S:
case TB_PORT_RX_CL0S:
case TB_PORT_CL1:
--
2.43.0
Powered by blists - more mailing lists