linux-kernel - Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST#

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <zuiq3b2rsixymtjr3xzrb26clikvlja62wgj65umnse4kuk75c@x5qan73ispxe>
Date: Mon, 23 Jun 2025 05:29:46 -0600
From: Manivannan Sadhasivam <mani@...nel.org>
To: Geraldo Nascimento <geraldogabriel@...il.com>
Cc: linux-rockchip@...ts.infradead.org, 
	Hugh Cole-Baker <sigmaris@...il.com>, Shawn Lin <shawn.lin@...k-chips.com>, 
	Lorenzo Pieralisi <lpieralisi@...nel.org>, Krzysztof Wilczyński <kw@...ux.com>, 
	Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>, Rob Herring <robh@...nel.org>, 
	Bjorn Helgaas <bhelgaas@...gle.com>, Heiko Stuebner <heiko@...ech.de>, linux-pci@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on
 failure without PERST#

On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote:
> After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi
> N10 through trial-and-error debugging, I finally got positive results
> with enumeration on the PCI bus for both a Realtek 8111E NIC and a
> Samsung PM981a SSD.
> 
> The NIC was connected to a M.2->PCIe x4 riser card and it would get
> stuck on Polling.Compliance, without breaking electrical idle on the
> Host RX side. The Samsung PM981a SSD is directly connected to M.2
> connector and that SSD is known to be quirky (OEM... no support)
> and non-functional on the RK3399 platform.
> 
> The Samsung SSD was even worse than the NIC - it would get stuck on
> Detect.Active like a bricked card, even though it was fully functional
> via USB adapter.
> 
> It seems both devices benefit from retrying Link Training if - big if
> here - PERST# is not toggled during retry.
> 
> For retry to work, flow must be exactly as handled by present patch,
> that is, we must cut power, disable the clocks, then re-enable
> both clocks and power regulators and go through initialization
> without touching PERST#. Then quirky devices are able to sucessfully
> enumerate.
> 

This sounds weird. PERST# is just an indication to the device that the power and
refclk are applied or going to be removed. The devices uses PERST# to prepare
for the power removal during assert and start functioning after deassert.

It looks like the PERST# polarity is inverted in your case. Could you please
change the 'ep-gpios' polarity to GPIO_ACTIVE_LOW and see if it fixes the issue
without this patch?

If that didn't work, could you please drop the 'ep-gpios' property and check?

> No functional change intended for already working devices.
> 
> Signed-off-by: Geraldo Nascimento <geraldogabriel@...il.com>
> ---
>  drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++---
>  1 file changed, 40 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
> index 2a1071cd3241..67b3b379d277 100644
> --- a/drivers/pci/controller/pcie-rockchip-host.c
> +++ b/drivers/pci/controller/pcie-rockchip-host.c
> @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip)
>  static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip)
>  {
>  	struct device *dev = rockchip->dev;
> -	int err, i = MAX_LANE_NUM;
> +	int err, i = MAX_LANE_NUM, is_reinit = 0;
>  	u32 status;
>  
> -	gpiod_set_value_cansleep(rockchip->perst_gpio, 0);
> +	if (!is_reinit) {
> +		gpiod_set_value_cansleep(rockchip->perst_gpio, 0);
> +	}
>  
> +reinit:

So this reinit part only skips the PERST# assert, but calls
rockchip_pcie_init_port() which resets the Root Port including PHY. I don't
think it is safe to do it if PERST# is wired.

- Mani

-- 
மணிவண்ணன் சதாசிவம்