lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64roqxsxgv7mtmftbn77hlviz45he2kf5juhk6e6oh5ol7hmhh@r56njiqyghqk>
Date: Wed, 10 Dec 2025 13:33:20 +0800
From: "Chia-Lin Kao (AceLan)" <acelan.kao@...onical.com>
To: Mario Limonciello <mario.limonciello@....com>
Cc: Mika Westerberg <mika.westerberg@...ux.intel.com>, Sanath.S@....com, 
	Andreas Noever <andreas.noever@...il.com>, Mika Westerberg <westeri@...nel.org>, 
	Yehezkel Bernat <YehezkelShB@...il.com>, linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] [RFC] thunderbolt: Add delay for Dell U2725QE link width

On Tue, Dec 09, 2025 at 10:49:46AM -0600, Mario Limonciello wrote:
> +Sanath too
> 
> On 12/9/2025 1:06 AM, Mika Westerberg wrote:
> > +Mario since this is AMD related.
> > 
> > [Also keeping all the context].
> > 
> 
> Thanks for adding me.  A few other thoughts I have:
> 
> 1) Is it possible that the USB4 controller in the monitor is powering up or
> exiting a low power state during the first hotplug?
That's a good point. We currently only have one TB4 monitor.
I'll check with out QA to see if we have other TB4 devices to try.

The issue can't be reproduced with TB3 monitor or TB3 storage.
> 
> 2) Are you sure this only happens on AMD host?  What if you cold boot the
> monitor with Intel host?
The issue could be reproduced with Intel host, too.
I addressed this in reply to Mika's email.

> 
> 
> > On Tue, Dec 09, 2025 at 01:41:41PM +0800, Chia-Lin Kao (AceLan) wrote:
> > > When plugging in a Dell U2725QE Thunderbolt monitor, the kernel produces
> > > a call trace during initial enumeration. The device automatically
> > > disconnects and reconnects ~3 seconds later, and works correctly on the
> > > second attempt.
> > > 
> > > Issue Description:
> > > ==================
> > > The Dell U2725QE (USB4 device 8087:b26) requires additional time during
> > > link width negotiation from single lane to dual lane. On first plug, the
> > > following sequence occurs:
> > > 
> > > 1. Port state reaches TB_PORT_UP (link established, single lane)
> > > 2. Path activation begins immediately
> > > 3. tb_path_activate() - > tb_port_write() returns -ENOTCONN (error -107)
> > > 4. Call trace is generated at tb_path_activate()
> > > 5. Device disconnects/reconnects automatically after ~3 seconds
> > > 6. Second attempt succeeds with full dual-lane bandwidth
> > > 
> > > First attempt dmesg (failure):
> > > -------------------------------
> > > [   36.030347] thunderbolt 0000:c7:00.6: 2:16: available bandwidth for new USB3 tunnel 9000/9000 Mb/s
> > > [   36.030613] thunderbolt 0000:c7:00.6: 2: USB3 tunnel creation failed
> > > [   36.031530] thunderbolt 0000:c7:00.6: PCIe Down path activation failed
> > > [   36.031531] WARNING: drivers/thunderbolt/path.c:589 at 0x0, CPU#12: pool-/usr/libex/3145
> > > 
> > > Second attempt dmesg (success):
> > > --------------------------------
> > > [   40.440012] thunderbolt 0000:c7:00.6: 2:16: available bandwidth for new USB3 tunnel 36000/36000 Mb/s
> > > [   40.440261] thunderbolt 0000:c7:00.6: 2:16: maximum required bandwidth for USB3 tunnel 9000 Mb/s
> > > [   40.440269] thunderbolt 0000:c7:00.6: 0:4 <-> 2:16 (USB3): activating
> > > [   40.440271] thunderbolt 0000:c7:00.6: 0:4 <-> 2:16 (USB3): allocating initial bandwidth 9000/9000 Mb/s
> > > 
> > > The bandwidth difference (9000 vs 36000 Mb/s) indicates the first attempt
> > > occurs while the link is still in single-lane mode.
> > > 
> > > Root Cause Analysis:
> > > ====================
> > > The error originates from the Thunderbolt/USB4 device hardware itself:
> > > 
> > > 1. Port config space read/write returns TB_CFG_ERROR_PORT_NOT_CONNECTED
> > > 2. This gets translated to -ENOTCONN in tb_cfg_get_error()
> > > 3. The port's control channel is temporarily unavailable during state
> > >     transition from single lane to dual lane (lane bonding)
> > > 
> > > The comment in drivers/thunderbolt/ctl.c explains this is expected:
> > >    "Port is not connected. This can happen during surprise removal.
> > >     Do not warn."
> > > 
> > > Attempted Solutions:
> > > ====================
> > > 1. Retry logic on -ENOTCONN in tb_path_activate():
> > >     Result: Caused host port (0:0) lockup with hundreds of "downstream
> > >     port is locked" errors. Rejected by user.
> > > 
> > > 2. Increased tb_port_wait_for_link_width() timeout from 100ms to 3000ms:
> > >     Result: Did not resolve the issue. The timeout increase alone is
> > >     insufficient because the port state hasn't reached TB_PORT_UP when
> > >     lane bonding is attempted.
> > > 
> > > 3. Added msleep(2000) at various points in enumeration flow:
> > >     Locations tested:
> > >     - Before tb_switch_configure(): Works ✓
> > >     - Before tb_switch_add(): Works ✓
> > >     - Before usb4_port_hotplug_enable(): Works ✓
> > >     - After tb_switch_add(): Doesn't work ✗
> > >     - In tb_configure_link(): Doesn't work ✗
> > >     - In tb_switch_lane_bonding_enable(): Doesn't work ✗
> > >     - In tb_port_wait_for_link_width(): Doesn't work ✗
> > > 
> > >     The pattern shows the delay must occur BEFORE hotplug enable, which
> > >     happens early in tb_switch_port_hotplug_enable() -> usb4_port_hotplug_enable().
> > > 
> > > Current Workaround:
> > > ===================
> > > Add a 2-second delay in tb_wait_for_port() when the port state reaches
> > > TB_PORT_UP. This is the earliest point where we know:
> > > - The link is physically established
> > > - The device is responsive
> > > - But lane width negotiation may still be in progress
> > > 
> > > This location is chosen because:
> > > 1. It's called during port enumeration before any tunnel creation
> > > 2. The port has just transitioned to TB_PORT_UP state
> > > 3. Allows sufficient time for lane bonding to complete
> > > 4. Avoids affecting other code paths
> > > 
> > > Testing Results:
> > > ================
> > > With this patch:
> > > - No call trace on first plug
> > > - Device enumerates correctly on first attempt
> > > - Full bandwidth (36000 Mb/s) available immediately
> > > - No disconnect/reconnect cycle
> > > - USB and PCIe tunnels create successfully
> > > 
> > > Without this patch:
> > > - Call trace on every first plug
> > > - Only 9000 Mb/s bandwidth (single lane) on first attempt
> > > - Automatic disconnect/reconnect after ~3 seconds
> > > - Second attempt works with 36000 Mb/s
> > > 
> > > Discussion Points for RFC:
> > > ===========================
> > > 1. Is a fixed 2-second delay acceptable, or should we poll for a
> > >     specific hardware state?
> > > 
> > > 2. Should we check PORT_CS_18_TIP (Transition In Progress) bit instead
> > >     of using a fixed delay?
> > > 
> > > 3. Is there a better location for this delay in the enumeration flow?
> > > 
> > > 4. Should this be device-specific (based on vendor/device ID) or apply
> > >     to all USB4 devices?
> > > 
> > > 5. The 100ms timeout in tb_switch_lane_bonding_enable() may be too
> > >     short for other devices as well. Should we increase it universally?
> > 
> > We should understand the issue better. This is Intel Goshen Ridge based
> > monitor which I'm pretty sure does not require additional quirks, at least
> > I have not heard any issues like this. I suspect this is combination of the
> > AMD and Intel hardware that is causing the issue.
> > 
> > Looking at your dmesg, even before your issue there is suspicious log
> > entry:
> > 
> > [    5.852476] localhost kernel: [31] thunderbolt 0000:c7:00.5: acking hot unplug event on 0:6
> > [    5.852492] localhost kernel: [12] thunderbolt 0000:c7:00.5: 0:6: DP IN resource unavailable: adapter unplug
> > 
> > This causes tearing down the DP tunnel. It is unexpected for the host
> > router to send this unless you plugged monitor directly to some of the
> > Type-C ports at this time?
> > 
> > I wonder if you could take trace logs too from the issue? Instructions:
> > 
> > https://github.com/intel/tbtools?tab=readme-ov-file#tracing
> > https://github.com/intel/tbtools/wiki/Useful-Commands#tracing
> > 
> > Please provide both full dmesg and the trace.out or the merged one. That
> > would allow us to look what is going on (hopefully).
> 
> We need to be careful trusting the LLM conclusions.
> 
> Hopefully the traces requested by Mika show what's going on.
> 
> If they don't, then I think the next step will be a USB4 analyzer.
> 
> > 
> > > Hardware Details:
> > > =================
> > > Device: Dell U2725QE Thunderbolt Monitor
> > > USB4 Router: 8087:b26 (Intel USB4 controller)
> > > Host: AMD Thunderbolt 4 controller (0000:c7:00.6)
> 
> What sort of hardware is the AMD host?  PCI BDF is meaningless.
It's a Dell's ongoing project.

ubuntu@...alhost:~$ lscpu
Architecture:                x86_64
  CPU op-mode(s):            32-bit, 64-bit
  Address sizes:             48 bits physical, 48 bits virtual
  Byte Order:                Little Endian
CPU(s):                      24
  On-line CPU(s) list:       0-23
Vendor ID:                   AuthenticAMD
  Model name:                AMD Ryzen AI 9 HX PRO 370 w/ Radeon 890M
    CPU family:              26
    Model:                   36
    Thread(s) per core:      2
    Core(s) per socket:      12
    Socket(s):               1
    Stepping:                0
    Frequency boost:         enabled
    CPU(s) scaling MHz:      24%
    CPU max MHz:             5157.8950
    CPU min MHz:             605.2640
    BogoMIPS:                3992.19
    Flags:                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxe
                             xt fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmpe
                             rf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy
                              svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb 
                             bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced v
                             mmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq adx smap avx512ifma clflushop
                             t clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_
                             local user_shstk avx_vnni avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save 
                             tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl 
                             vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bu
                             s_lock_detect movdiri movdir64b overflow_recov succor smca fsrm avx512_vp2intersect flush_l1d amd_lbr_pmc_freeze
Virtualization features:     
  Virtualization:            AMD-V
Caches (sum of all):         
  L1d:                       576 KiB (12 instances)
  L1i:                       384 KiB (12 instances)
  L2:                        12 MiB (12 instances)
  L3:                        24 MiB (2 instances)
NUMA:                        
  NUMA node(s):              1
  NUMA node0 CPU(s):         0-23
Vulnerabilities:             
  Gather data sampling:      Not affected
  Ghostwrite:                Not affected
  Indirect target selection: Not affected
  Itlb multihit:             Not affected
  L1tf:                      Not affected
  Mds:                       Not affected
  Meltdown:                  Not affected
  Mmio stale data:           Not affected
  Old microcode:             Not affected
  Reg file data sampling:    Not affected
  Retbleed:                  Not affected
  Spec rstack overflow:      Mitigation; IBPB on VMEXIT only
  Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                     Not affected
  Tsa:                       Not affected
  Tsx async abort:           Not affected
  Vmscape:                   Mitigation; IBPB on VMEXIT

ubuntu@...alhost:~$ sudo lspci -vvnns c7:00.6
c7:00.6 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:151d] (prog-if 40 [USB4 Host Interface])
        Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:151d]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 47
        IOMMU group: 32
        Region 0: Memory at b0c80000 (64-bit, non-prefetchable) [size=512K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [64] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s, Width x16
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a0] MSI: Enable- Count=1/16 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
                Vector table: BAR=0 offset=0007e000
                PBA: BAR=0 offset=0007f000
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [2a0 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Kernel driver in use: thunderbolt
        Kernel modules: thunderbolt

> 
> > > 
> > > Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@...onical.com>
> > > ---
> > > Full dmesg log available at: https://paste.ubuntu.com/p/CXs2T4XzZ3/
> > > ---
> > >   drivers/thunderbolt/switch.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
> > > index b3948aad0b955..e0c65e5fb0dca 100644
> > > --- a/drivers/thunderbolt/switch.c
> > > +++ b/drivers/thunderbolt/switch.c
> > > @@ -530,6 +530,8 @@ int tb_wait_for_port(struct tb_port *port, bool wait_if_unplugged)
> > >   			return 0;
> > >   		case TB_PORT_UP:
> > > +			msleep(2000);
> > > +			fallthrough;
> > >   		case TB_PORT_TX_CL0S:
> > >   		case TB_PORT_RX_CL0S:
> > >   		case TB_PORT_CL1:
> > > -- 
> > > 2.43.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ