lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXLM5vk0gDNjLPyY@acelan-Precision-5480>
Date: Fri, 23 Jan 2026 10:04:11 +0800
From: "Chia-Lin Kao (AceLan)" <acelan.kao@...onical.com>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: Andreas Noever <andreas.noever@...il.com>, 
	Mika Westerberg <westeri@...nel.org>, Yehezkel Bernat <YehezkelShB@...il.com>, 
	linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] thunderbolt: Fix PCIe device enumeration with delayed
 rescan

On Wed, Jan 21, 2026 at 07:01:42AM +0100, Mika Westerberg wrote:
> Hi,
>
> On Wed, Jan 21, 2026 at 01:27:44PM +0800, Chia-Lin Kao (AceLan) wrote:
> > PCIe devices behind Thunderbolt tunnels may fail to enumerate when
> > spurious hotplug events prevent pciehp from detecting link-up.
> >
> > Root cause:
> >
> > Spurious unplug events occur immediately after tunnel activation:
> >
> >   [  932.438] thunderbolt: acking hot unplug event on 702:2
> >   [  932.852] thunderbolt: PCIe Up path activation complete
> >   [  932.855] thunderbolt: hotplug event for upstream port 702:2
> >             (unplug: 0)
> >   [  932.855] thunderbolt: hotplug event for upstream port 702:2
> >             (unplug: 1)
> >
> > These events disrupt pciehp timing, causing device enumeration to fail
> > ~70% of the time on affected hardware. Manual PCI rescan succeeds,
> > proving devices are present and functional on the bus.
>
> Thanks for the report!
>
> They are likely TB3 xHCI "plug" events or so but they should not affect
> anything really.
>
> It may be that there is something in the TB3 compatibility side that we are
> not doing which needs to be investigated.
>
> From your merged.out.bad:
>
> CM does xHCI connect here:
>
> [  152.905840] [182] thunderbolt 0000:c7:00.6: 702: xHCI connect request
> [  152.906865] [182] thunderbolt 0000:c7:00.6: hotplug event for upstream port 702:2 (unplug: 0)
> [  152.906869] [182] thunderbolt 0000:c7:00.6: 2:8: got plug event for connected port, ignoring
> [  152.906872] [182] thunderbolt 0000:c7:00.6: hotplug event for upstream port 702:2 (unplug: 1)
> [  152.906875] [182] thunderbolt 0000:c7:00.6: 2:8: got unplug event for disconnected port, ignoring
>
> [  192.931373] [49] thunderbolt 0000:c7:00.6: acking hot unplug event on 2:7
>
> Can you comment out call to tb_switch_xhci_connect() and see if that
> changes anything?
Here is what I modified, and the problem becomes a little bit complicated.

I did the following steps(1~5) and captured the tbtrace log at step 5.
https://people.canonical.com/~acelan/bugs/tbt_storage/merged.out.remove_tb_switch_xhci_connect.out
1. Plugged one tbt storage on the Dock and connected the dock to the
machine
2. Boot the machine up, and it recognizes the tbt storage
3. Plugged the second tbt storage on the dock, and it also can be
recognized(it was always failed in this step)
4. Unplugged the first and second tbt storage from the dock, and then
re-plugged the first tbt storage on the dock, and it can be recognized
5. Re-plugged the second tbt storage on the dock, and it fails.

(continue doing the following tests)
a. When the issue happens, re-plugging the second tbt storage doesn't
work.
b. Plugged both tbt storages on the dock, and then re-plugged the dock to
the machine, both tbt storages can be recognized.
	b.1 In this case, it works when re-plugging the first or the
	    second tbt storage on the dock(there is always one tbt storage
	    still connected to the dock)
	b.2 Removed both tbt storages from the dock, and then
	    re-plugged them one by one, and the second tbt storage can't be
	    recognized.
c. Plugged one tbt storage on the dock, and then re-connected the dock to
	the machine, the tbt storage can be recognized.
	c.1 Plugged the second tbt storage on the dock, and the second tbt
	    storage can be recognized.
	c.2 Re-plugged the first or the second tbt storage on the
	    dock, both tbt storages can be recognized.
	c.3 Removed both tbt storages from the dock, and then
	    re-plugged them one by one, and the second tbt storage can't be
	    recognized.(same as b.2)

The issue could be reproduced when connecting the second tbt storage
to the dock.
1. Connect the dock to the machine with any tbt storage
2. Or remove all tbt storages from the dock if the dock is connected
3. And then plugged tbt storages one by one, and the second one won't be
   recognized.

rescan finds the missing tbt storage, but it works only one time. Need
to rescan again when re-plugging the first or the second tbt storage.
   echo 1 | sudo tee /sys/bus/pci/rescan

BTW, when the second tbt storage can't be recognized, unplug the first tbt
storage from the dock and the second tbt storage can be recognized.
And then re-plugged the first tbt storage on the dock, it can't be
recognized. The behavior just looks like it's the second tbt storage.

diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
index 4f5f1dfc0fbf..be7ff82a3846 100644
--- a/drivers/thunderbolt/tb.c
+++ b/drivers/thunderbolt/tb.c
@@ -2503,8 +2503,8 @@ static void tb_handle_hotplug(struct work_struct *work)
        } else if (port->remote) {
                tb_port_dbg(port, "got plug event for connected port, ignoring\n");
        } else if (!port->port && sw->authorized) {
-               tb_sw_dbg(sw, "xHCI connect request\n");
-               tb_switch_xhci_connect(sw);
+               tb_sw_dbg(sw, "DEBUG: Comment out xHCI connect request\n");
+               //tb_switch_xhci_connect(sw);
        } else {
                if (tb_port_is_null(port)) {
                        tb_port_dbg(port, "hotplug: scanning\n");

>
> > Solution:
> >
> > Schedule delayed work (300ms) after tunnel activation to:
> > 1. Check if pciehp successfully enumerated devices (device count increased)
> > 2. If not, trigger pci_rescan_bus() to discover devices manually
> > 3. Log results for observability
> >
> > The delayed work approach is non-blocking and only rescans when actually
> > needed, avoiding overhead on systems where pciehp works correctly.
>
> There is no way we are going to call PCI functions from the tb.c.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ