lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQ840q5BxNS1eIai@ryzen>
Date: Sat, 8 Nov 2025 13:34:26 +0100
From: Niklas Cassel <cassel@...nel.org>
To: Shawn Lin <shawn.lin@...k-chips.com>
Cc: FUKAUMI Naoki <naoki@...xa.com>, Damien Le Moal <dlemoal@...nel.org>,
	Anand Moon <linux.amoon@...il.com>, linux-pci@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org,
	linux-rockchip@...ts.infradead.org, linux-kernel@...r.kernel.org,
	Dragan Simic <dsimic@...jaro.org>,
	Lorenzo Pieralisi <lpieralisi@...nel.org>,
	Krzysztof Wilczyński <kw@...ux.com>,
	Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
	Rob Herring <robh@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
	Heiko Stuebner <heiko@...ech.de>
Subject: Re: [PATCH] PCI: dw-rockchip: Skip waiting for link up

Hello Shawn,

On Tue, Oct 21, 2025 at 03:10:13PM +0800, Shawn Lin wrote:
> 在 2025/10/21 星期二 12:26, FUKAUMI Naoki 写道:
> > Hi Niklas, Bjorn,
> > 
> > I noticed an issue on the Rockchip RK3588S SoC using the ASMedia ASM2806
> > PCIe bridge where devices behind the bridge fail to probe since v6.14.
> > Specifically, this started happening after commit
> > 647d69605c70368d54fc012fce8a43e8e5955b04.
> > dmesg logs from before and after this commit are available at:
> >   https://gist.github.com/RadxaNaoki/fca2bfca2ee80fefee7b00c7967d2e3d
> > 
> > I have confirmed that reverting the following commits fixes the issue:
> >   commit ec9fd499b9c6 ("PCI: dw-rockchip: Don't wait for link since we
> > can detect Link Up")
> >   commit 0e0b45ab5d77 ("PCI: dw-rockchip: Enumerate endpoints based on
> > dll_link_up IRQ")
> > 
> 
> Then these two commits would like to reply on link up irq instead of
> fixed delay in dwc framework. Here is a not very precise timeline
> description.
> 
> time(ms) |  dw_pcie_wait_for_link()     | sys irq_thread() | Hot reset
> -------------------------------------------------------------------------
> 0:       |  dw_pcie_link_up return false  |  link up irq     |
> 1x       |  Physical link up happend      |                  |
> 90:      |  dw_pcie_link_up return true   |                  |
> 100:     |                                |  msleep(100) done|
> 10x:     |                                |  pci_rescan_bus  |
> 1xx:     |                                |                  | <==occur
> 190:     |  msleep(90) done               |                  |
> 19x:     |  pci_host_probe                |                  |
> 
> What if the hot reset happens when pci_rescan_bus() starts. I think
> scan devices possible fail when seeing 0xffffffff from cfg read. But
> a 90ms delay perfectly avoids this event in dw_pcie_wait_for_link(), and by
> the time the 90ms delay is completed, the link is actually in an
> accessible state.

The pcie-dw-rockchip.c driver is modelled after the qcom driver.
So if this is a problem when a ASM2806 switch is connected, I would
expect qcom platforms to have the same problem.


Do we have a PCI trace that can tell us exactly what goes wrong?

FUKAUMI-san tells us that the enumeration does not detect any devices,
but also that there is no crash.

If we assume the scenario from your timeline above, that a hot reset
happens just after pci_rescan_bus(), after a hot reset, LTSSM should
re-enter link training.

I verified this:
# bc=$(setpci -s 0000:00:00.0 BRIDGE_CONTROL)
# setpci -s 0000:00:00.0 BRIDGE_CONTROL=$(printf "%04x" $((0x$bc | 0x40))) && sl eep 0.01 && setpci -s 0000:00:00.0 BRIDGE_CONTROL=$bc
[   65.723990] rockchip-dw-pcie a40000000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 0x7
[   65.724701] rockchip-dw-pcie a40000000.pcie: LTSSM_STATUS: 0x30011
[   65.825787] rockchip-dw-pcie a40000000.pcie: Received Link up event. Starting enumeration!

So we get another link up IRQ after the hot reset.

The IRQ handler for this IRQ will once again call pci_rescan_bus().
So I would expect that this second pci_rescan_bus() call would actually
be able to find the device behind the switch.


Mani, Bjorn, thoughts?



Kind regards,
Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ