[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aFk-MeIWFcBiGBPr@geday>
Date: Mon, 23 Jun 2025 08:44:49 -0300
From: Geraldo Nascimento <geraldogabriel@...il.com>
To: Manivannan Sadhasivam <mani@...nel.org>
Cc: linux-rockchip@...ts.infradead.org,
Hugh Cole-Baker <sigmaris@...il.com>,
Shawn Lin <shawn.lin@...k-chips.com>,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
Krzysztof Wilczyński <kw@...ux.com>,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
Rob Herring <robh@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Heiko Stuebner <heiko@...ech.de>, linux-pci@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on
failure without PERST#
On Mon, Jun 23, 2025 at 05:29:46AM -0600, Manivannan Sadhasivam wrote:
> On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote:
> > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi
> > N10 through trial-and-error debugging, I finally got positive results
> > with enumeration on the PCI bus for both a Realtek 8111E NIC and a
> > Samsung PM981a SSD.
> >
> > The NIC was connected to a M.2->PCIe x4 riser card and it would get
> > stuck on Polling.Compliance, without breaking electrical idle on the
> > Host RX side. The Samsung PM981a SSD is directly connected to M.2
> > connector and that SSD is known to be quirky (OEM... no support)
> > and non-functional on the RK3399 platform.
> >
> > The Samsung SSD was even worse than the NIC - it would get stuck on
> > Detect.Active like a bricked card, even though it was fully functional
> > via USB adapter.
> >
> > It seems both devices benefit from retrying Link Training if - big if
> > here - PERST# is not toggled during retry.
> >
> > For retry to work, flow must be exactly as handled by present patch,
> > that is, we must cut power, disable the clocks, then re-enable
> > both clocks and power regulators and go through initialization
> > without touching PERST#. Then quirky devices are able to sucessfully
> > enumerate.
> >
>
> This sounds weird. PERST# is just an indication to the device that the power and
> refclk are applied or going to be removed. The devices uses PERST# to prepare
> for the power removal during assert and start functioning after deassert.
Hi Mani! Thank you for looking into this.
Yeah, tell me about it, it is beyond weird. I posted RFC Patch in the
hopes someone with access to PCIe Analyzer could have deeper look
at what the heck is going on here - because it does work, but I don't
claim to understand how.
>
> It looks like the PERST# polarity is inverted in your case. Could you please
> change the 'ep-gpios' polarity to GPIO_ACTIVE_LOW and see if it fixes the issue
> without this patch?
>
> If that didn't work, could you please drop the 'ep-gpios' property and check?
Sorry to decline your request, but I assure you I have tried many
other combinations before reaching present patch, including your
suggestion. It will do nothing. It won't work, won't make SSD that
refuse to work with RK3399, working. Note that this isn't specific
to my board - RK3399 is infamous for being picky about devices.
>
> > No functional change intended for already working devices.
> >
> > Signed-off-by: Geraldo Nascimento <geraldogabriel@...il.com>
> > ---
> > drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++---
> > 1 file changed, 40 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
> > index 2a1071cd3241..67b3b379d277 100644
> > --- a/drivers/pci/controller/pcie-rockchip-host.c
> > +++ b/drivers/pci/controller/pcie-rockchip-host.c
> > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip)
> > static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip)
> > {
> > struct device *dev = rockchip->dev;
> > - int err, i = MAX_LANE_NUM;
> > + int err, i = MAX_LANE_NUM, is_reinit = 0;
> > u32 status;
> >
> > - gpiod_set_value_cansleep(rockchip->perst_gpio, 0);
> > + if (!is_reinit) {
> > + gpiod_set_value_cansleep(rockchip->perst_gpio, 0);
> > + }
> >
> > +reinit:
>
> So this reinit part only skips the PERST# assert, but calls
> rockchip_pcie_init_port() which resets the Root Port including PHY. I don't
> think it is safe to do it if PERST# is wired.
I don't understand, could you be a bit more verbose on why do you
think this is dangerous?
Thanks,
Geraldo Nascimento
>
> - Mani
>
> --
> மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists