[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mrm7yif2tg7trvsof3jiqbevfldkf7ckkfswtabrnkc4dlgmae@qyp4s23utlid>
Date: Mon, 5 Jan 2026 17:11:42 +0530
From: Manivannan Sadhasivam <mani@...nel.org>
To: Niklas Cassel <cassel@...nel.org>
Cc: manivannan.sadhasivam@....qualcomm.com,
Jingoo Han <jingoohan1@...il.com>, Lorenzo Pieralisi <lpieralisi@...nel.org>,
Krzysztof Wilczyński <kwilczynski@...nel.org>, Rob Herring <robh@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
vincent.guittot@...aro.org, zhangsenchuan@...incomputing.com,
Shawn Lin <shawn.lin@...k-chips.com>, dlemoal@...nel.org
Subject: Re: [PATCH v3 0/4] PCI: dwc: Rework the error handling of
dw_pcie_wait_for_link() API
On Fri, Jan 02, 2026 at 01:01:02PM +0100, Niklas Cassel wrote:
> On Tue, Dec 30, 2025 at 08:37:31PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > Hi,
> >
> > This series reworks the dw_pcie_wait_for_link() API to allow the callers to
> > detect the absence of the device on the bus and skip the failure.
> >
> > Compared to v2, I've reworked the patch 2 to improve the API further and
> > dropped the patch 1 that got applied (hence changed the subject). I've also
> > modified the error code based on the feedback in v2 to return -ENODEV if device
> > is not detected on the bus and -ETIMEDOUT otherwise. This allows the callers to
> > skip the failure if device is not detected and handle error for other failure.
> >
> > Testing
> > =======
> >
> > Tested this series on Rb3Gen2 board without powering on the PCIe switch. Now the
> > dw_pcie_wait_for_link() API prints:
> >
> > qcom-pcie 1c08000.pcie: Device not found
> >
> > Instead of the previous log:
> >
> > qcom-pcie 1c08000.pcie: Phy link never came up
>
> Hello Mani,
>
> I really like this series.
>
> However when testing my usual setup with 2 Rock 5B:s, one in EP mode, one
> in RC mode, where I usually power on both boards at the same time, but only
> after both boards are booted, do I do the configfs write to enable the link
> training on EP, and then do a rescan on the RC.
>
> Even with this series, this workflow still works in 8 out of 10 boots.
>
>
> However, in 2 out of 10 boots I instead got:
> [ 2.285827] rockchip-dw-pcie a40000000.pcie: Link failed to come up. LTSSM: POLL_COMPLIANCE
> [ 2.286584] rockchip-dw-pcie a40000000.pcie: probe with driver rockchip-dw-pcie failed with error -110
>
> In both cases LTSSM was in POLL_COMPLIANCE.
>
>
> Considering that things work in 8 out of 10 boots, means that the LTSSM state
> was in Detect.Quiet or Detect.Active.
>
> I did comment out goto err_stop_link if dw_pcie_wait_for_link(), so I can dump
> LTSSM afterwards, when this happens.
>
> [ 2.293785] rockchip-dw-pcie a40000000.pcie: Link failed to come up. LTSSM: POLL_COMPLIANCE
>
> Then I do:
>
> # cat /sys/kernel/debug/dwc_pcie_a40000000.pcie/ltssm_status
> POLL_COMPLIANCE (0x03)
>
> So LTSSM is still in Poll.Compliance.
>
> However, as soon as I do the configfs writes on the EP board:
>
>
> # cat /sys/kernel/debug/dwc_pcie_a40000000.pcie/ltssm_status
> L0 (0x11)
> # cat /sys/kernel/debug/dwc_pcie_a40000000.pcie/ltssm_status
> L0 (0x11)
>
> LTSSM transitions out of compliance, and rescan will find my device:
>
> # echo 1 > /sys/bus/pci/devices/0000:00:00.0/rescan
> [ 246.777867] pci 0000:01:00.0: [1d87:3588] type 00 class 0xff0000 PCIe Endpoint
> [ 246.778627] pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff]
> [ 246.779151] pci 0000:01:00.0: BAR 1 [mem 0x00000000-0x000fffff]
> [ 246.779672] pci 0000:01:00.0: BAR 2 [mem 0x00000000-0x000fffff]
> [ 246.780192] pci 0000:01:00.0: BAR 3 [mem 0x00000000-0x000fffff]
> [ 246.780716] pci 0000:01:00.0: BAR 5 [mem 0x00000000-0x000fffff]
> [ 246.781236] pci 0000:01:00.0: ROM [mem 0x00000000-0x0000ffff pref]
>
>
>
> I understand that in most normal situations, the endpoint is powered on
> before powering on the host side (or there is no EP connected at all).
> But somehow, for us PCIe endpoint developers, it would be nice if we
> could keep the behavior of being able to rescan the bus, even when the EP
> is not powered on before the host side.
>
What could be happening here is that since the endpoint is physically connected
to the bus, the receiver gets detected during Detect.Active state and LTSSM
enters the Polling state. I think the reason why it ended up staying in
Poll.Compliance could be due to (as per the spec):
a. Not all Lanes from the predetermined set of Lanes from above have
detected an exit from Electrical Idle since entering Polling.Active.
b. Any Lane that detected a Receiver during Detect received eight consecutive
TS1 Ordered Sets (or their complement) with the Lane and Link numbers set to
PAD, the Compliance Receive bit (bit 4 of Symbol 5) is 1b, and the Loopback bit
(bit 2 of Symbol 5) is 0b that the Compliance Receive bit (bit 4 of Symbol 5) is
set.
So this is perfectly legal from endpoint perspective.
> Perhaps a Kconfig or module param? Suggestions?
>
There is a DIRECT_POLCOMP_TO_DETECT bit (bit 9) in DBI SD_CONTROL2 register.
This bit will ensure that the LTSSM will not stuck in Poll.Compliance and will
return back to Detect state. Could you set it on the EP before starting LTSSM
and see if it helps?
- Mani
--
மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists