[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <djbhz7qfyzrn7mdqmvqhyh6yjsjyigjly7py4f7aj5f4qbabou@67gk3qdnvzws>
Date: Thu, 17 Jul 2025 17:59:32 +0530
From: Manivannan Sadhasivam <mani@...nel.org>
To: Geraldo Nascimento <geraldogabriel@...il.com>,
Shawn Lin <shawn.lin@...k-chips.com>
Cc: linux-rockchip@...ts.infradead.org,
Hugh Cole-Baker <sigmaris@...il.com>, Lorenzo Pieralisi <lpieralisi@...nel.org>,
Krzysztof Wilczyński <kw@...ux.com>, Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
Rob Herring <robh@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Heiko Stuebner <heiko@...ech.de>, linux-pci@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on
failure without PERST#
On Mon, Jun 23, 2025 at 08:44:49AM GMT, Geraldo Nascimento wrote:
> On Mon, Jun 23, 2025 at 05:29:46AM -0600, Manivannan Sadhasivam wrote:
> > On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote:
> > > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi
> > > N10 through trial-and-error debugging, I finally got positive results
> > > with enumeration on the PCI bus for both a Realtek 8111E NIC and a
> > > Samsung PM981a SSD.
> > >
> > > The NIC was connected to a M.2->PCIe x4 riser card and it would get
> > > stuck on Polling.Compliance, without breaking electrical idle on the
> > > Host RX side. The Samsung PM981a SSD is directly connected to M.2
> > > connector and that SSD is known to be quirky (OEM... no support)
> > > and non-functional on the RK3399 platform.
> > >
> > > The Samsung SSD was even worse than the NIC - it would get stuck on
> > > Detect.Active like a bricked card, even though it was fully functional
> > > via USB adapter.
> > >
> > > It seems both devices benefit from retrying Link Training if - big if
> > > here - PERST# is not toggled during retry.
> > >
> > > For retry to work, flow must be exactly as handled by present patch,
> > > that is, we must cut power, disable the clocks, then re-enable
> > > both clocks and power regulators and go through initialization
> > > without touching PERST#. Then quirky devices are able to sucessfully
> > > enumerate.
> > >
> >
> > This sounds weird. PERST# is just an indication to the device that the power and
> > refclk are applied or going to be removed. The devices uses PERST# to prepare
> > for the power removal during assert and start functioning after deassert.
>
> Hi Mani! Thank you for looking into this.
>
> Yeah, tell me about it, it is beyond weird. I posted RFC Patch in the
> hopes someone with access to PCIe Analyzer could have deeper look
> at what the heck is going on here - because it does work, but I don't
> claim to understand how.
>
I was hoping that the Rockchip folks would chime in, but no reply from them so
far.
@Shawn: Could you please shed some light here?
> >
> > It looks like the PERST# polarity is inverted in your case. Could you please
> > change the 'ep-gpios' polarity to GPIO_ACTIVE_LOW and see if it fixes the issue
> > without this patch?
> >
> > If that didn't work, could you please drop the 'ep-gpios' property and check?
>
> Sorry to decline your request, but I assure you I have tried many
> other combinations before reaching present patch, including your
> suggestion. It will do nothing. It won't work, won't make SSD that
> refuse to work with RK3399, working. Note that this isn't specific
> to my board - RK3399 is infamous for being picky about devices.
>
> >
> > > No functional change intended for already working devices.
> > >
> > > Signed-off-by: Geraldo Nascimento <geraldogabriel@...il.com>
> > > ---
> > > drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++---
> > > 1 file changed, 40 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
> > > index 2a1071cd3241..67b3b379d277 100644
> > > --- a/drivers/pci/controller/pcie-rockchip-host.c
> > > +++ b/drivers/pci/controller/pcie-rockchip-host.c
> > > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip)
> > > static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip)
> > > {
> > > struct device *dev = rockchip->dev;
> > > - int err, i = MAX_LANE_NUM;
> > > + int err, i = MAX_LANE_NUM, is_reinit = 0;
> > > u32 status;
> > >
> > > - gpiod_set_value_cansleep(rockchip->perst_gpio, 0);
> > > + if (!is_reinit) {
> > > + gpiod_set_value_cansleep(rockchip->perst_gpio, 0);
> > > + }
> > >
> > > +reinit:
> >
> > So this reinit part only skips the PERST# assert, but calls
> > rockchip_pcie_init_port() which resets the Root Port including PHY. I don't
> > think it is safe to do it if PERST# is wired.
>
> I don't understand, could you be a bit more verbose on why do you
> think this is dangerous?
>
When the Root Port and PHY gets reset, there is a good chance that the refclk
would also be cutoff. So if that happens without PERST# assert, then the device
has no chance to clean its state machine. If the device gets its own refclk,
then it is a different story, but we should not make assumptions.
- Mani
--
மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists