[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQ840q5BxNS1eIai@ryzen>
Date: Sat, 8 Nov 2025 13:34:26 +0100
From: Niklas Cassel <cassel@...nel.org>
To: Shawn Lin <shawn.lin@...k-chips.com>
Cc: FUKAUMI Naoki <naoki@...xa.com>, Damien Le Moal <dlemoal@...nel.org>,
Anand Moon <linux.amoon@...il.com>, linux-pci@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
linux-rockchip@...ts.infradead.org, linux-kernel@...r.kernel.org,
Dragan Simic <dsimic@...jaro.org>,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
Krzysztof Wilczyński <kw@...ux.com>,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
Rob Herring <robh@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Heiko Stuebner <heiko@...ech.de>
Subject: Re: [PATCH] PCI: dw-rockchip: Skip waiting for link up
Hello Shawn,
On Tue, Oct 21, 2025 at 03:10:13PM +0800, Shawn Lin wrote:
> 在 2025/10/21 星期二 12:26, FUKAUMI Naoki 写道:
> > Hi Niklas, Bjorn,
> >
> > I noticed an issue on the Rockchip RK3588S SoC using the ASMedia ASM2806
> > PCIe bridge where devices behind the bridge fail to probe since v6.14.
> > Specifically, this started happening after commit
> > 647d69605c70368d54fc012fce8a43e8e5955b04.
> > dmesg logs from before and after this commit are available at:
> > https://gist.github.com/RadxaNaoki/fca2bfca2ee80fefee7b00c7967d2e3d
> >
> > I have confirmed that reverting the following commits fixes the issue:
> > commit ec9fd499b9c6 ("PCI: dw-rockchip: Don't wait for link since we
> > can detect Link Up")
> > commit 0e0b45ab5d77 ("PCI: dw-rockchip: Enumerate endpoints based on
> > dll_link_up IRQ")
> >
>
> Then these two commits would like to reply on link up irq instead of
> fixed delay in dwc framework. Here is a not very precise timeline
> description.
>
> time(ms) | dw_pcie_wait_for_link() | sys irq_thread() | Hot reset
> -------------------------------------------------------------------------
> 0: | dw_pcie_link_up return false | link up irq |
> 1x | Physical link up happend | |
> 90: | dw_pcie_link_up return true | |
> 100: | | msleep(100) done|
> 10x: | | pci_rescan_bus |
> 1xx: | | | <==occur
> 190: | msleep(90) done | |
> 19x: | pci_host_probe | |
>
> What if the hot reset happens when pci_rescan_bus() starts. I think
> scan devices possible fail when seeing 0xffffffff from cfg read. But
> a 90ms delay perfectly avoids this event in dw_pcie_wait_for_link(), and by
> the time the 90ms delay is completed, the link is actually in an
> accessible state.
The pcie-dw-rockchip.c driver is modelled after the qcom driver.
So if this is a problem when a ASM2806 switch is connected, I would
expect qcom platforms to have the same problem.
Do we have a PCI trace that can tell us exactly what goes wrong?
FUKAUMI-san tells us that the enumeration does not detect any devices,
but also that there is no crash.
If we assume the scenario from your timeline above, that a hot reset
happens just after pci_rescan_bus(), after a hot reset, LTSSM should
re-enter link training.
I verified this:
# bc=$(setpci -s 0000:00:00.0 BRIDGE_CONTROL)
# setpci -s 0000:00:00.0 BRIDGE_CONTROL=$(printf "%04x" $((0x$bc | 0x40))) && sl eep 0.01 && setpci -s 0000:00:00.0 BRIDGE_CONTROL=$bc
[ 65.723990] rockchip-dw-pcie a40000000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 0x7
[ 65.724701] rockchip-dw-pcie a40000000.pcie: LTSSM_STATUS: 0x30011
[ 65.825787] rockchip-dw-pcie a40000000.pcie: Received Link up event. Starting enumeration!
So we get another link up IRQ after the hot reset.
The IRQ handler for this IRQ will once again call pci_rescan_bus().
So I would expect that this second pci_rescan_bus() call would actually
be able to find the device behind the switch.
Mani, Bjorn, thoughts?
Kind regards,
Niklas
Powered by blists - more mailing lists