linux-kernel - Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST#

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <djbhz7qfyzrn7mdqmvqhyh6yjsjyigjly7py4f7aj5f4qbabou@67gk3qdnvzws>
Date: Thu, 17 Jul 2025 17:59:32 +0530
From: Manivannan Sadhasivam <mani@...nel.org>
To: Geraldo Nascimento <geraldogabriel@...il.com>, 
	Shawn Lin <shawn.lin@...k-chips.com>
Cc: linux-rockchip@...ts.infradead.org, 
	Hugh Cole-Baker <sigmaris@...il.com>, Lorenzo Pieralisi <lpieralisi@...nel.org>, 
	Krzysztof Wilczyński <kw@...ux.com>, Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>, 
	Rob Herring <robh@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>, 
	Heiko Stuebner <heiko@...ech.de>, linux-pci@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on
 failure without PERST#

On Mon, Jun 23, 2025 at 08:44:49AM GMT, Geraldo Nascimento wrote:
> On Mon, Jun 23, 2025 at 05:29:46AM -0600, Manivannan Sadhasivam wrote:
> > On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote:
> > > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi
> > > N10 through trial-and-error debugging, I finally got positive results
> > > with enumeration on the PCI bus for both a Realtek 8111E NIC and a
> > > Samsung PM981a SSD.
> > > 
> > > The NIC was connected to a M.2->PCIe x4 riser card and it would get
> > > stuck on Polling.Compliance, without breaking electrical idle on the
> > > Host RX side. The Samsung PM981a SSD is directly connected to M.2
> > > connector and that SSD is known to be quirky (OEM... no support)
> > > and non-functional on the RK3399 platform.
> > > 
> > > The Samsung SSD was even worse than the NIC - it would get stuck on
> > > Detect.Active like a bricked card, even though it was fully functional
> > > via USB adapter.
> > > 
> > > It seems both devices benefit from retrying Link Training if - big if
> > > here - PERST# is not toggled during retry.
> > > 
> > > For retry to work, flow must be exactly as handled by present patch,
> > > that is, we must cut power, disable the clocks, then re-enable
> > > both clocks and power regulators and go through initialization
> > > without touching PERST#. Then quirky devices are able to sucessfully
> > > enumerate.
> > > 
> > 
> > This sounds weird. PERST# is just an indication to the device that the power and
> > refclk are applied or going to be removed. The devices uses PERST# to prepare
> > for the power removal during assert and start functioning after deassert.
> 
> Hi Mani! Thank you for looking into this.
> 
> Yeah, tell me about it, it is beyond weird. I posted RFC Patch in the
> hopes someone with access to PCIe Analyzer could have deeper look
> at what the heck is going on here - because it does work, but I don't
> claim to understand how.
> 

I was hoping that the Rockchip folks would chime in, but no reply from them so
far.

@Shawn: Could you please shed some light here?

> > 
> > It looks like the PERST# polarity is inverted in your case. Could you please
> > change the 'ep-gpios' polarity to GPIO_ACTIVE_LOW and see if it fixes the issue
> > without this patch?
> > 
> > If that didn't work, could you please drop the 'ep-gpios' property and check?
> 
> Sorry to decline your request, but I assure you I have tried many
> other combinations before reaching present patch, including your
> suggestion. It will do nothing. It won't work, won't make SSD that
> refuse to work with RK3399, working. Note that this isn't specific
> to my board - RK3399 is infamous for being picky about devices.
> 
> > 
> > > No functional change intended for already working devices.
> > > 
> > > Signed-off-by: Geraldo Nascimento <geraldogabriel@...il.com>
> > > ---
> > >  drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++---
> > >  1 file changed, 40 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
> > > index 2a1071cd3241..67b3b379d277 100644
> > > --- a/drivers/pci/controller/pcie-rockchip-host.c
> > > +++ b/drivers/pci/controller/pcie-rockchip-host.c
> > > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip)
> > >  static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip)
> > >  {
> > >  	struct device *dev = rockchip->dev;
> > > -	int err, i = MAX_LANE_NUM;
> > > +	int err, i = MAX_LANE_NUM, is_reinit = 0;
> > >  	u32 status;
> > >  
> > > -	gpiod_set_value_cansleep(rockchip->perst_gpio, 0);
> > > +	if (!is_reinit) {
> > > +		gpiod_set_value_cansleep(rockchip->perst_gpio, 0);
> > > +	}
> > >  
> > > +reinit:
> > 
> > So this reinit part only skips the PERST# assert, but calls
> > rockchip_pcie_init_port() which resets the Root Port including PHY. I don't
> > think it is safe to do it if PERST# is wired.
> 
> I don't understand, could you be a bit more verbose on why do you
> think this is dangerous?
> 

When the Root Port and PHY gets reset, there is a good chance that the refclk
would also be cutoff. So if that happens without PERST# assert, then the device
has no chance to clean its state machine. If the device gets its own refclk,
then it is a different story, but we should not make assumptions.

- Mani

-- 
மணிவண்ணன் சதாசிவம்