lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd006977-513f-43d6-9238-1b9f39313976@kylinos.cn>
Date: Tue, 3 Feb 2026 17:04:53 +0800
From: Jayi Li <lijiayi@...inos.cn>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>,
 "Chia-Lin Kao (AceLan)" <acelan.kao@...onical.com>,
 Andreas Noever <andreas.noever@...il.com>,
 Mika Westerberg <westeri@...nel.org>, Yehezkel Bernat
 <YehezkelShB@...il.com>, linux-usb@...r.kernel.org,
 linux-kernel@...r.kernel.org, Gil Fine <gil.fine@...ux.intel.com>
Subject: Re: [PATCH] thunderbolt: Fix PCIe device enumeration with delayed
 rescan

Hi,

在 2026/1/29 14:50, Mika Westerberg 写道:
> On Thu, Jan 29, 2026 at 01:45:51PM +0800, Chia-Lin Kao (AceLan) wrote:
>> On Tue, Jan 27, 2026 at 11:17:01AM +0100, Mika Westerberg wrote:
>>> On Tue, Jan 27, 2026 at 09:45:13AM +0100, Mika Westerberg wrote:
>>>> On Tue, Jan 27, 2026 at 01:04:20PM +0800, Chia-Lin Kao (AceLan) wrote:
>>>>> On Mon, Jan 26, 2026 at 12:56:54PM +0100, Mika Westerberg wrote:
>>>>>> On Mon, Jan 26, 2026 at 03:48:48PM +0800, Chia-Lin Kao (AceLan) wrote:
>>>>>>> On Mon, Jan 26, 2026 at 06:42:31AM +0100, Mika Westerberg wrote:
>>>>>>>> On Mon, Jan 26, 2026 at 11:30:47AM +0800, Chia-Lin Kao (AceLan) wrote:
>>>>>>>>> Hi,
>>>>>>>>> On Fri, Jan 23, 2026 at 01:01:12PM +0100, Mika Westerberg wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 23, 2026 at 10:04:11AM +0800, Chia-Lin Kao (AceLan) wrote:
>>>>>>>>>>>> Can you comment out call to tb_switch_xhci_connect() and see if that
>>>>>>>>>>>> changes anything?
>>>>>>>>>>> Here is what I modified, and the problem becomes a little bit complicated.
>>>>>>>>>> Okay I see it did not change anything (well this is kind of what I
>>>>>>>>>> expected). Thanks for trying.
>>>>>>>>>>
>>>>>>>>>> I see in your log that the PCIe tunnel is established just fine. It's just
>>>>>>>>>> that there is no PCIe hotplug happening or it is happening but the PCIe
>>>>>>>>>> Downstream Port is not waking up.
>>>>>>>>>>
>>>>>>>>>> I figured you have following USB4/TB topology, right?
>>>>>>>>>>
>>>>>>>>>>    AMD Host <-> GR Hub <-> TB3 Hub
>>>>>>>>>>                    ^
>>>>>>>>>>                    |
>>>>>>>>>>                  TB3 Hub
>>>>>>>>> Should be more like this
>>>>>>>>>    AMD Host <-> Dell TB4 Dock <-> OWC Envoy Express (1-502)
>>>>>>>>>                               \
>>>>>>>>>                                <-> OWC Envoy Express (1-702)
>>>>>>>>> or
>>>>>>>>>    AMD Host (1-0, domain1)
>>>>>>>>>        |
>>>>>>>>>        └─ Port 2 ──→ Dell Thunderbolt 4 Dock (1-2)
>>>>>>>>>                        ├─ Port 5 ──→ OWC Envoy Express (1-502)
>>>>>>>>>                        └─ Port 7 ──→ OWC Envoy Express (1-702)
>>>>>>>> Okay so the same ;-)
>>>>>>>>
>>>>>>>>>> What if you run 'lspci' after the issue reproduces? Does that bring the
>>>>>>>>>> missing PCIe devices? I suspect that this is due to older TB3 devices that
>>>>>>>>>> they may need bit more time to get the PCIe link (going over the tunnel) up
>>>>>>>>>> and running.
>>>>>>>>> lspci doesn't bring back the missing tbt storage.
>>>>>>>> Forgot to mention that let it (the whole topology) enter runtime suspend
>>>>>>>> before you run lspci.
>>>>>>> https://people.canonical.com/~acelan/bugs/tbt_storage/dmesg_lspci.log
>>>>>>>
>>>>>>> The behavior is strange, the following 3 devices keep entering D3cold and then comes back
>>>>>>> to D0 quickly. So, I'm not sure if the lspci do the actions you want.
>>>>>> Yes. I should have mentioned so the lspci is there exactly to trigger
>>>>>> runtime resume of the topology. I was hoping the PCIe links get
>>>>>> re-established properly then.
>>>>>>
>>>>>> Can you do so that you:
>>>>>>
>>>>>> 1. Plug in the dock.
>>>>>> 2. Plug in the other storage to the dock.
>>>>>> 3. Block runtime PM from the PCIe Downstream Port that should lead to the
>>>>>>     second storage device PCIe Upstream Port
>>>>>>
>>>>>>   # echo on > /sys/bus/pci/devices/DEVICE/power/control
>>>>>>
>>>>>> 4. Connect the second storage device and enable PCIe tunnel.
>>>>>>
>>>>>> Does that make it work each time?
>>>>> Yes, follow the steps makes it work.
>>>>>
>>>>>     echo on | sudo tee /sys/bus/pci/devices/*/*/power/control
>>>>>
>>>>> Re-plug the dock, need to disable the runpm again.
>>>> But can you just block it from the PCIe Downstream Port that leads to the
>>>> "non-working" storage before you enable PCIe tunnel? Not for all the
>>>> devices.
>>>>
>>>> (let me know if you want help locating the correct device).
>>>>
>>>> Does it still work?
>> Here's the full PCI device chain graph:
>>
>>      0000:00:01.2 - AMD PCI Root Port
>>          |
>>          └─ 0000:61:00.0 - Intel Thunderbolt 4 Bridge [Goshen Ridge 2020]
>>                 |
>>                 └─ 0000:62:02.0 - Intel Thunderbolt 4 Bridge [Goshen Ridge 2020]
>>                        |
>>                        └─ 0000:83:00.0 - Intel TBT3 Bridge (Upstream Port) [Alpine Ridge LP]
>>                               |
>>                               └─ 0000:84:01.0 - Intel TBT3 Bridge (Downstream Port) [Alpine Ridge LP]
>>                                      |
>>                                      └─ 0000:85:00.0 - Sandisk PC SN740 NVMe SSD (nvme2)
>>
>> When the tbt storage is not recognized, we don't have 83:00.0 and its
>> downstream port 84:01.0.
>>
>> $ ls /sys/bus/pci/devices
>> 0000:00:00.0  0000:00:02.1  0000:00:08.1  0000:00:18.1  0000:00:18.7  0000:62:04.0  0000:c3:00.0  0000:c5:00.5  0000:c7:00.4
>> 0000:00:00.2  0000:00:02.3  0000:00:08.2  0000:00:18.2  0000:61:00.0  0000:a2:00.0  0000:c4:00.0  0000:c5:00.7  0000:c7:00.5
>> 0000:00:01.0  0000:00:02.4  0000:00:08.3  0000:00:18.3  0000:62:00.0  0000:a3:01.0  0000:c5:00.0  0000:c6:00.0  0000:c7:00.6
>> 0000:00:01.1  0000:00:02.5  0000:00:14.0  0000:00:18.4  0000:62:01.0  0000:a4:00.0  0000:c5:00.1  0000:c6:00.1
>> 0000:00:01.2  0000:00:03.0  0000:00:14.3  0000:00:18.5  0000:62:02.0  0000:c1:00.0  0000:c5:00.2  0000:c7:00.0
>> 0000:00:02.0  0000:00:08.0  0000:00:18.0  0000:00:18.6  0000:62:03.0  0000:c2:00.0  0000:c5:00.4  0000:c7:00.3
>>
>> Disable runpm on 62:02.0, then we have 83:00.0 and its downstream port
>> 84:01.0 and 85:00.0, and then the tbt storage is recognized.
> Okay that means there is nothing wrong with the PCIe tunnel itself it's
> just that the PCIe side either does not get the PME or does not see that
> the PCIe link becomes active (e.g the PCIe Downstream Port runtime suspends
> itself before the link status changes).
>
> PME work so that there is wake first on Intel it's GPE that wakes up the
> root port and then PCIe stack wakes up devices and then the PME message is
> sent to the root complex.
>
> If you do this on Intel host do you see the same?

I also encountered a similar issue where the PCIe hotplug IRQ is not 
received
after path setup completion. This was observed specifically during 
Thunderbolt 3
device hotplug testing.

To investigate, I applied a debug patch (attached below) to dump 
ADP_PCIE_CS_0.
I observed that when the issue occurs, the PCIe upstream port's LTSSM is 
not in the DETECT state,
yet the PE (Port Enable) bit remains set to 1.

My workaround is to check the LTSSM state before the path setup.
If this specific anomaly is detected, I explicitly set PE to 0 to reset 
the link state.
With this change, the link returns to the correct state. After the path 
setup completes,
the PCIe hotplug IRQ is received correctly.

I'm not sure if this is relevant to this issue, but sharing just in case.

Here is the debug patch I used to observe the ADP_PCIE_CS_0 state:

diff --git a/drivers/thunderbolt/path.c b/drivers/thunderbolt/path.c
index d5d1f520571b..d8808cb614a4 100644
--- a/drivers/thunderbolt/path.c
+++ b/drivers/thunderbolt/path.c
@@ -491,6 +491,25 @@ void tb_path_deactivate(struct tb_path *path)
         path->activated = false;
  }

+void print_adp_pcie_cs_0(struct tb_port *port)
+{
+       u32 val;
+       int ret;
+
+       if (!port || !port->cap_adap ||
+           (!tb_port_is_pcie_down(port) && !tb_port_is_pcie_up(port)))
+               return;
+
+       ret = tb_port_read(port, &val, TB_CFG_PORT,
+                          port->cap_adap + ADP_PCIE_CS_0, 1);
+
+       if (ret)
+               tb_port_warn(port, "failed to read ADP_PCIE_CS_0: %d\n", 
ret);
+       else
+               tb_port_info(port, "ADP_PCIE_CS_0 = 0x%08x\n", val);
+}
+EXPORT_SYMBOL_GPL(print_adp_pcie_cs_0);
+
  /**
   * tb_path_activate() - activate a path
   * @path: Path to activate
@@ -582,6 +601,17 @@ int tb_path_activate(struct tb_path *path)
         }
         path->activated = true;
         tb_dbg(path->tb, "path activation complete\n");
+
+       if (path) {
+               pr_info("tb_path_activated: Path %s activated, length: 
%d\n",
+                       path->name, path->path_length);
+
+               for (i = 0; i < path->path_length; i++) {
+  print_adp_pcie_cs_0(path->hops[i].in_port);
+  print_adp_pcie_cs_0(path->hops[i].out_port);
+               }
+       }
+
         return 0;
  err:
         tb_WARN(path->tb, "path activation failed\n");
diff --git a/drivers/thunderbolt/tunnel.c b/drivers/thunderbolt/tunnel.c
index b1458b741b7d..22c70f18f0ff 100644
--- a/drivers/thunderbolt/tunnel.c
+++ b/drivers/thunderbolt/tunnel.c
@@ -208,6 +208,9 @@ static int tb_pci_activate(struct tb_tunnel *tunnel, 
bool activate)
                         return res;
         }

+       print_adp_pcie_cs_0(tunnel->src_port);
+       print_adp_pcie_cs_0(tunnel->dst_port);
+
         return activate ? 0 : tb_pci_set_ext_encapsulation(tunnel, 
activate);
  }

@@ -2191,6 +2194,9 @@ int tb_tunnel_restart(struct tb_tunnel *tunnel)
                 }
         }

+       print_adp_pcie_cs_0(tunnel->src_port);
+       print_adp_pcie_cs_0(tunnel->dst_port);
+
         if (tunnel->init) {
                 res = tunnel->init(tunnel);
                 if (res)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ