lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <011ac1d065e782258c4bf7a1b5309142b4501d0d.camel@wdc.com>
Date: Wed, 30 Apr 2025 10:14:51 +0000
From: Wilfred Mallawa <wilfred.mallawa@....com>
To: "cassel@...nel.org" <cassel@...nel.org>
CC: "dlemoal@...nel.org" <dlemoal@...nel.org>, "kw@...ux.com" <kw@...ux.com>,
	"alistair@...stair23.me" <alistair@...stair23.me>, "robh@...nel.org"
	<robh@...nel.org>, "manivannan.sadhasivam@...aro.org"
	<manivannan.sadhasivam@...aro.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "lpieralisi@...nel.org"
	<lpieralisi@...nel.org>, "p.zabel@...gutronix.de" <p.zabel@...gutronix.de>,
	"heiko@...ech.de" <heiko@...ech.de>, "bhelgaas@...gle.com"
	<bhelgaas@...gle.com>, "linux-pci@...r.kernel.org"
	<linux-pci@...r.kernel.org>, "linux-rockchip@...ts.infradead.org"
	<linux-rockchip@...ts.infradead.org>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH] PCI: dwc: Add support for slot reset on link down event

Hey Niklas,

On Wed, 2025-04-30 at 10:03 +0200, Niklas Cassel wrote:
> Hello Wilfred,
> 
> Nice to see this feature :)
> 
> On Wed, Apr 30, 2025 at 05:43:51PM +1000, Wilfred Mallawa wrote:
> > From: Wilfred Mallawa <wilfred.mallawa@....com>
> > 
> > The PCIe link may go down in cases like firmware crashes or
> > unstable
> > connections. When this occurs, the PCIe slot must be reset to
> > restore
> > functionality. However, the current driver lacks link down
> > handling,
> > forcing users to reboot the system to recover.
> > 
> > This patch implements the `reset_slot` callback for link down
> > handling
> > for DWC PCIe host controller. In which, the RC is reset,
> > reconfigured
> > and link training initiated to recover from the link down event.
> > 
> > This patch by extension fixes issues with sysfs initiated bus
> > resets.
> > In that, currently, when a sysfs initiated bus reset is issued, the
> > endpoint device is non-functional after (may link up with
> > downgraded link
> > status). With this patch adding support for link down recovery, a
> > sysfs
> > initiated bus reset works as intended. Testing conducted on a
> > ROCK5B board
> > with an M.2 NVMe drive.
> > 
> > Signed-off-by: Wilfred Mallawa <wilfred.mallawa@....com>
> > ---
> > Hey all,
> > 
> > This patch builds ontop of [1] to extend the reset slot support for
> > the
> > DWC PCIe host controller. Which implements link down recovery
> > support
> > for the DesignWare PCIe host controller by adding a `reset_slot`
> > callback.
> > This allows the system to recover from PCIe link failures without
> > requiring a reboot.
> > 
> > This patch by extension improves the behavior of sysfs-initiated
> > bus resets.
> > Previously, a `reset` issued via sysfs could leave the endpoint in
> > a
> > non-functional state or with downgraded link parameters. With the
> > added
> > link down recovery logic, sysfs resets now result in a properly
> > reinitialized
> > and fully functional endpoint device. This issue was discovered on
> > a
> > Rock5B board, and thus testing was also conducted on the same
> > platform
> > with a known good M.2 NVMe drive.
> > 
> > Thanks!
> > 
> > [1]
> > https://lore.kernel.org/all/20250417-pcie-reset-slot-v3-0-59a10811c962@linaro.org/
> > ---
> >  drivers/pci/controller/dwc/Kconfig            |  1 +
> >  drivers/pci/controller/dwc/pcie-dw-rockchip.c | 89
> > ++++++++++++++++++++++++++-
> >  2 files changed, 88 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/Kconfig
> > b/drivers/pci/controller/dwc/Kconfig
> > index
> > d9f0386396edf66ad0e514a0f545ed24d89fcb6c..878c52de0842e32ca50dfcc4b
> > 66231a73ef436c4 100644
> > --- a/drivers/pci/controller/dwc/Kconfig
> > +++ b/drivers/pci/controller/dwc/Kconfig
> > @@ -347,6 +347,7 @@ config PCIE_ROCKCHIP_DW_HOST
> >  	depends on OF
> >  	select PCIE_DW_HOST
> >  	select PCIE_ROCKCHIP_DW
> > +	select PCI_HOST_COMMON
> >  	help
> >  	  Enables support for the DesignWare PCIe controller in
> > the
> >  	  Rockchip SoC (except RK3399) to work in host mode.
> > diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > index
> > 3c6ab71c996ec1246954f52a9454c8ae67956a54..4c2c683d170f19266e1dfe0ef
> > de76d6feb23bf7a 100644
> > --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > @@ -23,6 +23,8 @@
> >  #include <linux/reset.h>
> >  
> >  #include "pcie-designware.h"
> > +#include "../../pci.h"
> > +#include "../pci-host-common.h"
> >  
> >  /*
> >   * The upper 16 bits of PCIE_CLIENT_CONFIG are a write
> > @@ -83,6 +85,9 @@ struct rockchip_pcie_of_data {
> >  	const struct pci_epc_features *epc_features;
> >  };
> >  
> > +static int rockchip_pcie_rc_reset_slot(struct pci_host_bridge
> > *bridge,
> > +				       struct pci_dev *pdev);
> > +
> >  static int rockchip_pcie_readl_apb(struct rockchip_pcie *rockchip,
> > u32 reg)
> >  {
> >  	return readl_relaxed(rockchip->apb_base + reg);
> > @@ -256,6 +261,7 @@ static int rockchip_pcie_host_init(struct
> > dw_pcie_rp *pp)
> >  					 rockchip);
> >  
> >  	rockchip_pcie_enable_l0s(pci);
> > +	pp->bridge->reset_slot = rockchip_pcie_rc_reset_slot;
> >  
> >  	return 0;
> >  }
> > @@ -455,6 +461,11 @@ static irqreturn_t
> > rockchip_pcie_rc_sys_irq_thread(int irq, void *arg)
> >  	dev_dbg(dev, "PCIE_CLIENT_INTR_STATUS_MISC: %#x\n", reg);
> >  	dev_dbg(dev, "LTSSM_STATUS: %#x\n",
> > rockchip_pcie_get_ltssm(rockchip));
> >  
> > +	if (reg & PCIE_LINK_REQ_RST_NOT_INT) {
> > +		dev_dbg(dev, "hot reset or link-down reset\n");
> > +		pci_host_handle_link_down(pp->bridge);
> > +	}
> > +
> >  	if (reg & PCIE_RDLH_LINK_UP_CHGED) {
> >  		if (rockchip_pcie_link_up(pci)) {
> >  			dev_dbg(dev, "Received Link up event.
> > Starting enumeration!\n");
> > @@ -536,8 +547,8 @@ static int rockchip_pcie_configure_rc(struct
> > platform_device *pdev,
> >  		return ret;
> >  	}
> >  
> > -	/* unmask DLL up/down indicator */
> > -	val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED, 0);
> > +	/* unmask DLL up/down indicator and hot reset/link-down
> > reset irq */
> > +	val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED |
> > PCIE_LINK_REQ_RST_NOT_INT, 0);
> >  	rockchip_pcie_writel_apb(rockchip, val,
> > PCIE_CLIENT_INTR_MASK_MISC);
> >  
> >  	return ret;
> > @@ -688,6 +699,80 @@ static int rockchip_pcie_probe(struct
> > platform_device *pdev)
> >  	return ret;
> >  }
> >  
> > +static int rockchip_pcie_rc_reset_slot(struct pci_host_bridge
> > *bridge,
> > +				  struct pci_dev *pdev)
> > +{
> > +	struct pci_bus *bus = bridge->bus;
> > +	struct dw_pcie_rp *pp = bus->sysdata;
> > +	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > +	struct rockchip_pcie *rockchip = to_rockchip_pcie(pci);
> > +	struct device *dev = rockchip->pci.dev;
> > +	u32 val;
> > +	int ret;
> > +
> > +	dw_pcie_stop_link(pci);
> > +	rockchip_pcie_phy_deinit(rockchip);
> > +
> > +	ret = reset_control_assert(rockchip->rst);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = rockchip_pcie_phy_init(rockchip);
> > +	if (ret)
> > +		goto disable_regulator;
> > +
> > +	ret = reset_control_deassert(rockchip->rst);
> > +	if (ret)
> > +		goto deinit_phy;
> > +
> > +	ret = rockchip_pcie_clk_init(rockchip);
> > +	if (ret)
> > +		goto deinit_phy;
> 
> Here you call rockchip_pcie_clk_init(), but I don't see that we ever
> called
> clk_bulk_disable_unprepare() after stopping the link. I would have
> expected
> the clk framework to have given us a warning about enabling clocks
> that are
> already enabled.
> 
Ah that's a good point. I didn't notice any warnings whilst testing.
Perhaps worth looking into also. But I will add
`clk_bulk_disable_unprepare()` after stopping the link.
> 
> > +
> > +	ret = pp->ops->init(pp);
> > +	if (ret) {
> > +		dev_err(dev, "Host init failed: %d\n", ret);
> > +		goto deinit_clk;
> > +	}
> > +
> > +	/* LTSSM enable control mode */
> > +	val = HIWORD_UPDATE_BIT(PCIE_LTSSM_ENABLE_ENHANCE);
> > +	rockchip_pcie_writel_apb(rockchip, val,
> > PCIE_CLIENT_HOT_RESET_CTRL);
> > +
> > +	rockchip_pcie_writel_apb(rockchip, PCIE_CLIENT_RC_MODE,
> > PCIE_CLIENT_GENERAL_CON);
> > +
> > +	ret = dw_pcie_setup_rc(pp);
> > +	if (ret) {
> > +		dev_err(dev, "Failed to setup RC: %d\n", ret);
> > +		goto deinit_clk;
> > +	}
> > +
> > +	/* unmask DLL up/down indicator and hot reset/link-down
> > reset irq */
> > +	val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED |
> > PCIE_LINK_REQ_RST_NOT_INT, 0);
> > +	rockchip_pcie_writel_apb(rockchip, val,
> > PCIE_CLIENT_INTR_MASK_MISC);
> > +
> > +	ret = dw_pcie_start_link(pci);
> > +	if (ret)
> > +		return ret;
> 
> Here you are simply returning ret in case of error,
> should we perhaps goto error if this function fails?
Ah yeah! that does make more sense.
> 
> 
> > +
> > +	ret = dw_pcie_wait_for_link(pci);
> > +	if (ret)
> > +		return ret;
> 
> Here, I think that we should simply call dw_pcie_wait_for_link()
> without checking for error.
> 
> 1) That is what Mani does in the qcom patch that implements the
> reset_slot() callback.
> 2) That is what we do in:
> https://github.com/torvalds/linux/blob/master/drivers/pci/controller/dwc/pcie-designware-host.c#L558-L559
> 
> (Ignore errors, the link may come up later)
Okay will fixup! thanks for the feedback!

Wilfred
> 
> 
> Kind regards,
> Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ