[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <011ac1d065e782258c4bf7a1b5309142b4501d0d.camel@wdc.com>
Date: Wed, 30 Apr 2025 10:14:51 +0000
From: Wilfred Mallawa <wilfred.mallawa@....com>
To: "cassel@...nel.org" <cassel@...nel.org>
CC: "dlemoal@...nel.org" <dlemoal@...nel.org>, "kw@...ux.com" <kw@...ux.com>,
"alistair@...stair23.me" <alistair@...stair23.me>, "robh@...nel.org"
<robh@...nel.org>, "manivannan.sadhasivam@...aro.org"
<manivannan.sadhasivam@...aro.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "lpieralisi@...nel.org"
<lpieralisi@...nel.org>, "p.zabel@...gutronix.de" <p.zabel@...gutronix.de>,
"heiko@...ech.de" <heiko@...ech.de>, "bhelgaas@...gle.com"
<bhelgaas@...gle.com>, "linux-pci@...r.kernel.org"
<linux-pci@...r.kernel.org>, "linux-rockchip@...ts.infradead.org"
<linux-rockchip@...ts.infradead.org>, "linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH] PCI: dwc: Add support for slot reset on link down event
Hey Niklas,
On Wed, 2025-04-30 at 10:03 +0200, Niklas Cassel wrote:
> Hello Wilfred,
>
> Nice to see this feature :)
>
> On Wed, Apr 30, 2025 at 05:43:51PM +1000, Wilfred Mallawa wrote:
> > From: Wilfred Mallawa <wilfred.mallawa@....com>
> >
> > The PCIe link may go down in cases like firmware crashes or
> > unstable
> > connections. When this occurs, the PCIe slot must be reset to
> > restore
> > functionality. However, the current driver lacks link down
> > handling,
> > forcing users to reboot the system to recover.
> >
> > This patch implements the `reset_slot` callback for link down
> > handling
> > for DWC PCIe host controller. In which, the RC is reset,
> > reconfigured
> > and link training initiated to recover from the link down event.
> >
> > This patch by extension fixes issues with sysfs initiated bus
> > resets.
> > In that, currently, when a sysfs initiated bus reset is issued, the
> > endpoint device is non-functional after (may link up with
> > downgraded link
> > status). With this patch adding support for link down recovery, a
> > sysfs
> > initiated bus reset works as intended. Testing conducted on a
> > ROCK5B board
> > with an M.2 NVMe drive.
> >
> > Signed-off-by: Wilfred Mallawa <wilfred.mallawa@....com>
> > ---
> > Hey all,
> >
> > This patch builds ontop of [1] to extend the reset slot support for
> > the
> > DWC PCIe host controller. Which implements link down recovery
> > support
> > for the DesignWare PCIe host controller by adding a `reset_slot`
> > callback.
> > This allows the system to recover from PCIe link failures without
> > requiring a reboot.
> >
> > This patch by extension improves the behavior of sysfs-initiated
> > bus resets.
> > Previously, a `reset` issued via sysfs could leave the endpoint in
> > a
> > non-functional state or with downgraded link parameters. With the
> > added
> > link down recovery logic, sysfs resets now result in a properly
> > reinitialized
> > and fully functional endpoint device. This issue was discovered on
> > a
> > Rock5B board, and thus testing was also conducted on the same
> > platform
> > with a known good M.2 NVMe drive.
> >
> > Thanks!
> >
> > [1]
> > https://lore.kernel.org/all/20250417-pcie-reset-slot-v3-0-59a10811c962@linaro.org/
> > ---
> > drivers/pci/controller/dwc/Kconfig | 1 +
> > drivers/pci/controller/dwc/pcie-dw-rockchip.c | 89
> > ++++++++++++++++++++++++++-
> > 2 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/Kconfig
> > b/drivers/pci/controller/dwc/Kconfig
> > index
> > d9f0386396edf66ad0e514a0f545ed24d89fcb6c..878c52de0842e32ca50dfcc4b
> > 66231a73ef436c4 100644
> > --- a/drivers/pci/controller/dwc/Kconfig
> > +++ b/drivers/pci/controller/dwc/Kconfig
> > @@ -347,6 +347,7 @@ config PCIE_ROCKCHIP_DW_HOST
> > depends on OF
> > select PCIE_DW_HOST
> > select PCIE_ROCKCHIP_DW
> > + select PCI_HOST_COMMON
> > help
> > Enables support for the DesignWare PCIe controller in
> > the
> > Rockchip SoC (except RK3399) to work in host mode.
> > diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > index
> > 3c6ab71c996ec1246954f52a9454c8ae67956a54..4c2c683d170f19266e1dfe0ef
> > de76d6feb23bf7a 100644
> > --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > @@ -23,6 +23,8 @@
> > #include <linux/reset.h>
> >
> > #include "pcie-designware.h"
> > +#include "../../pci.h"
> > +#include "../pci-host-common.h"
> >
> > /*
> > * The upper 16 bits of PCIE_CLIENT_CONFIG are a write
> > @@ -83,6 +85,9 @@ struct rockchip_pcie_of_data {
> > const struct pci_epc_features *epc_features;
> > };
> >
> > +static int rockchip_pcie_rc_reset_slot(struct pci_host_bridge
> > *bridge,
> > + struct pci_dev *pdev);
> > +
> > static int rockchip_pcie_readl_apb(struct rockchip_pcie *rockchip,
> > u32 reg)
> > {
> > return readl_relaxed(rockchip->apb_base + reg);
> > @@ -256,6 +261,7 @@ static int rockchip_pcie_host_init(struct
> > dw_pcie_rp *pp)
> > rockchip);
> >
> > rockchip_pcie_enable_l0s(pci);
> > + pp->bridge->reset_slot = rockchip_pcie_rc_reset_slot;
> >
> > return 0;
> > }
> > @@ -455,6 +461,11 @@ static irqreturn_t
> > rockchip_pcie_rc_sys_irq_thread(int irq, void *arg)
> > dev_dbg(dev, "PCIE_CLIENT_INTR_STATUS_MISC: %#x\n", reg);
> > dev_dbg(dev, "LTSSM_STATUS: %#x\n",
> > rockchip_pcie_get_ltssm(rockchip));
> >
> > + if (reg & PCIE_LINK_REQ_RST_NOT_INT) {
> > + dev_dbg(dev, "hot reset or link-down reset\n");
> > + pci_host_handle_link_down(pp->bridge);
> > + }
> > +
> > if (reg & PCIE_RDLH_LINK_UP_CHGED) {
> > if (rockchip_pcie_link_up(pci)) {
> > dev_dbg(dev, "Received Link up event.
> > Starting enumeration!\n");
> > @@ -536,8 +547,8 @@ static int rockchip_pcie_configure_rc(struct
> > platform_device *pdev,
> > return ret;
> > }
> >
> > - /* unmask DLL up/down indicator */
> > - val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED, 0);
> > + /* unmask DLL up/down indicator and hot reset/link-down
> > reset irq */
> > + val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED |
> > PCIE_LINK_REQ_RST_NOT_INT, 0);
> > rockchip_pcie_writel_apb(rockchip, val,
> > PCIE_CLIENT_INTR_MASK_MISC);
> >
> > return ret;
> > @@ -688,6 +699,80 @@ static int rockchip_pcie_probe(struct
> > platform_device *pdev)
> > return ret;
> > }
> >
> > +static int rockchip_pcie_rc_reset_slot(struct pci_host_bridge
> > *bridge,
> > + struct pci_dev *pdev)
> > +{
> > + struct pci_bus *bus = bridge->bus;
> > + struct dw_pcie_rp *pp = bus->sysdata;
> > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > + struct rockchip_pcie *rockchip = to_rockchip_pcie(pci);
> > + struct device *dev = rockchip->pci.dev;
> > + u32 val;
> > + int ret;
> > +
> > + dw_pcie_stop_link(pci);
> > + rockchip_pcie_phy_deinit(rockchip);
> > +
> > + ret = reset_control_assert(rockchip->rst);
> > + if (ret)
> > + return ret;
> > +
> > + ret = rockchip_pcie_phy_init(rockchip);
> > + if (ret)
> > + goto disable_regulator;
> > +
> > + ret = reset_control_deassert(rockchip->rst);
> > + if (ret)
> > + goto deinit_phy;
> > +
> > + ret = rockchip_pcie_clk_init(rockchip);
> > + if (ret)
> > + goto deinit_phy;
>
> Here you call rockchip_pcie_clk_init(), but I don't see that we ever
> called
> clk_bulk_disable_unprepare() after stopping the link. I would have
> expected
> the clk framework to have given us a warning about enabling clocks
> that are
> already enabled.
>
Ah that's a good point. I didn't notice any warnings whilst testing.
Perhaps worth looking into also. But I will add
`clk_bulk_disable_unprepare()` after stopping the link.
>
> > +
> > + ret = pp->ops->init(pp);
> > + if (ret) {
> > + dev_err(dev, "Host init failed: %d\n", ret);
> > + goto deinit_clk;
> > + }
> > +
> > + /* LTSSM enable control mode */
> > + val = HIWORD_UPDATE_BIT(PCIE_LTSSM_ENABLE_ENHANCE);
> > + rockchip_pcie_writel_apb(rockchip, val,
> > PCIE_CLIENT_HOT_RESET_CTRL);
> > +
> > + rockchip_pcie_writel_apb(rockchip, PCIE_CLIENT_RC_MODE,
> > PCIE_CLIENT_GENERAL_CON);
> > +
> > + ret = dw_pcie_setup_rc(pp);
> > + if (ret) {
> > + dev_err(dev, "Failed to setup RC: %d\n", ret);
> > + goto deinit_clk;
> > + }
> > +
> > + /* unmask DLL up/down indicator and hot reset/link-down
> > reset irq */
> > + val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED |
> > PCIE_LINK_REQ_RST_NOT_INT, 0);
> > + rockchip_pcie_writel_apb(rockchip, val,
> > PCIE_CLIENT_INTR_MASK_MISC);
> > +
> > + ret = dw_pcie_start_link(pci);
> > + if (ret)
> > + return ret;
>
> Here you are simply returning ret in case of error,
> should we perhaps goto error if this function fails?
Ah yeah! that does make more sense.
>
>
> > +
> > + ret = dw_pcie_wait_for_link(pci);
> > + if (ret)
> > + return ret;
>
> Here, I think that we should simply call dw_pcie_wait_for_link()
> without checking for error.
>
> 1) That is what Mani does in the qcom patch that implements the
> reset_slot() callback.
> 2) That is what we do in:
> https://github.com/torvalds/linux/blob/master/drivers/pci/controller/dwc/pcie-designware-host.c#L558-L559
>
> (Ignore errors, the link may come up later)
Okay will fixup! thanks for the feedback!
Wilfred
>
>
> Kind regards,
> Niklas
Powered by blists - more mailing lists