[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBHZT_mOruwH7HxJ@ryzen>
Date: Wed, 30 Apr 2025 10:03:27 +0200
From: Niklas Cassel <cassel@...nel.org>
To: Wilfred Mallawa <wilfred.opensource@...il.com>
Cc: Lorenzo Pieralisi <lpieralisi@...nel.org>,
Krzysztof WilczyĆski <kw@...ux.com>,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
Rob Herring <robh@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Heiko Stuebner <heiko@...ech.de>,
Philipp Zabel <p.zabel@...gutronix.de>,
Damien Le Moal <dlemoal@...nel.org>,
Alistair Francis <alistair@...stair23.me>,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
linux-rockchip@...ts.infradead.org,
Wilfred Mallawa <wilfred.mallawa@....com>
Subject: Re: [PATCH] PCI: dwc: Add support for slot reset on link down event
Hello Wilfred,
Nice to see this feature :)
On Wed, Apr 30, 2025 at 05:43:51PM +1000, Wilfred Mallawa wrote:
> From: Wilfred Mallawa <wilfred.mallawa@....com>
>
> The PCIe link may go down in cases like firmware crashes or unstable
> connections. When this occurs, the PCIe slot must be reset to restore
> functionality. However, the current driver lacks link down handling,
> forcing users to reboot the system to recover.
>
> This patch implements the `reset_slot` callback for link down handling
> for DWC PCIe host controller. In which, the RC is reset, reconfigured
> and link training initiated to recover from the link down event.
>
> This patch by extension fixes issues with sysfs initiated bus resets.
> In that, currently, when a sysfs initiated bus reset is issued, the
> endpoint device is non-functional after (may link up with downgraded link
> status). With this patch adding support for link down recovery, a sysfs
> initiated bus reset works as intended. Testing conducted on a ROCK5B board
> with an M.2 NVMe drive.
>
> Signed-off-by: Wilfred Mallawa <wilfred.mallawa@....com>
> ---
> Hey all,
>
> This patch builds ontop of [1] to extend the reset slot support for the
> DWC PCIe host controller. Which implements link down recovery support
> for the DesignWare PCIe host controller by adding a `reset_slot` callback.
> This allows the system to recover from PCIe link failures without requiring a reboot.
>
> This patch by extension improves the behavior of sysfs-initiated bus resets.
> Previously, a `reset` issued via sysfs could leave the endpoint in a
> non-functional state or with downgraded link parameters. With the added
> link down recovery logic, sysfs resets now result in a properly reinitialized
> and fully functional endpoint device. This issue was discovered on a
> Rock5B board, and thus testing was also conducted on the same platform
> with a known good M.2 NVMe drive.
>
> Thanks!
>
> [1] https://lore.kernel.org/all/20250417-pcie-reset-slot-v3-0-59a10811c962@linaro.org/
> ---
> drivers/pci/controller/dwc/Kconfig | 1 +
> drivers/pci/controller/dwc/pcie-dw-rockchip.c | 89 ++++++++++++++++++++++++++-
> 2 files changed, 88 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
> index d9f0386396edf66ad0e514a0f545ed24d89fcb6c..878c52de0842e32ca50dfcc4b66231a73ef436c4 100644
> --- a/drivers/pci/controller/dwc/Kconfig
> +++ b/drivers/pci/controller/dwc/Kconfig
> @@ -347,6 +347,7 @@ config PCIE_ROCKCHIP_DW_HOST
> depends on OF
> select PCIE_DW_HOST
> select PCIE_ROCKCHIP_DW
> + select PCI_HOST_COMMON
> help
> Enables support for the DesignWare PCIe controller in the
> Rockchip SoC (except RK3399) to work in host mode.
> diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> index 3c6ab71c996ec1246954f52a9454c8ae67956a54..4c2c683d170f19266e1dfe0efde76d6feb23bf7a 100644
> --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> @@ -23,6 +23,8 @@
> #include <linux/reset.h>
>
> #include "pcie-designware.h"
> +#include "../../pci.h"
> +#include "../pci-host-common.h"
>
> /*
> * The upper 16 bits of PCIE_CLIENT_CONFIG are a write
> @@ -83,6 +85,9 @@ struct rockchip_pcie_of_data {
> const struct pci_epc_features *epc_features;
> };
>
> +static int rockchip_pcie_rc_reset_slot(struct pci_host_bridge *bridge,
> + struct pci_dev *pdev);
> +
> static int rockchip_pcie_readl_apb(struct rockchip_pcie *rockchip, u32 reg)
> {
> return readl_relaxed(rockchip->apb_base + reg);
> @@ -256,6 +261,7 @@ static int rockchip_pcie_host_init(struct dw_pcie_rp *pp)
> rockchip);
>
> rockchip_pcie_enable_l0s(pci);
> + pp->bridge->reset_slot = rockchip_pcie_rc_reset_slot;
>
> return 0;
> }
> @@ -455,6 +461,11 @@ static irqreturn_t rockchip_pcie_rc_sys_irq_thread(int irq, void *arg)
> dev_dbg(dev, "PCIE_CLIENT_INTR_STATUS_MISC: %#x\n", reg);
> dev_dbg(dev, "LTSSM_STATUS: %#x\n", rockchip_pcie_get_ltssm(rockchip));
>
> + if (reg & PCIE_LINK_REQ_RST_NOT_INT) {
> + dev_dbg(dev, "hot reset or link-down reset\n");
> + pci_host_handle_link_down(pp->bridge);
> + }
> +
> if (reg & PCIE_RDLH_LINK_UP_CHGED) {
> if (rockchip_pcie_link_up(pci)) {
> dev_dbg(dev, "Received Link up event. Starting enumeration!\n");
> @@ -536,8 +547,8 @@ static int rockchip_pcie_configure_rc(struct platform_device *pdev,
> return ret;
> }
>
> - /* unmask DLL up/down indicator */
> - val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED, 0);
> + /* unmask DLL up/down indicator and hot reset/link-down reset irq */
> + val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED | PCIE_LINK_REQ_RST_NOT_INT, 0);
> rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_INTR_MASK_MISC);
>
> return ret;
> @@ -688,6 +699,80 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
> return ret;
> }
>
> +static int rockchip_pcie_rc_reset_slot(struct pci_host_bridge *bridge,
> + struct pci_dev *pdev)
> +{
> + struct pci_bus *bus = bridge->bus;
> + struct dw_pcie_rp *pp = bus->sysdata;
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct rockchip_pcie *rockchip = to_rockchip_pcie(pci);
> + struct device *dev = rockchip->pci.dev;
> + u32 val;
> + int ret;
> +
> + dw_pcie_stop_link(pci);
> + rockchip_pcie_phy_deinit(rockchip);
> +
> + ret = reset_control_assert(rockchip->rst);
> + if (ret)
> + return ret;
> +
> + ret = rockchip_pcie_phy_init(rockchip);
> + if (ret)
> + goto disable_regulator;
> +
> + ret = reset_control_deassert(rockchip->rst);
> + if (ret)
> + goto deinit_phy;
> +
> + ret = rockchip_pcie_clk_init(rockchip);
> + if (ret)
> + goto deinit_phy;
Here you call rockchip_pcie_clk_init(), but I don't see that we ever called
clk_bulk_disable_unprepare() after stopping the link. I would have expected
the clk framework to have given us a warning about enabling clocks that are
already enabled.
> +
> + ret = pp->ops->init(pp);
> + if (ret) {
> + dev_err(dev, "Host init failed: %d\n", ret);
> + goto deinit_clk;
> + }
> +
> + /* LTSSM enable control mode */
> + val = HIWORD_UPDATE_BIT(PCIE_LTSSM_ENABLE_ENHANCE);
> + rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_HOT_RESET_CTRL);
> +
> + rockchip_pcie_writel_apb(rockchip, PCIE_CLIENT_RC_MODE, PCIE_CLIENT_GENERAL_CON);
> +
> + ret = dw_pcie_setup_rc(pp);
> + if (ret) {
> + dev_err(dev, "Failed to setup RC: %d\n", ret);
> + goto deinit_clk;
> + }
> +
> + /* unmask DLL up/down indicator and hot reset/link-down reset irq */
> + val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED | PCIE_LINK_REQ_RST_NOT_INT, 0);
> + rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_INTR_MASK_MISC);
> +
> + ret = dw_pcie_start_link(pci);
> + if (ret)
> + return ret;
Here you are simply returning ret in case of error,
should we perhaps goto error if this function fails?
> +
> + ret = dw_pcie_wait_for_link(pci);
> + if (ret)
> + return ret;
Here, I think that we should simply call dw_pcie_wait_for_link()
without checking for error.
1) That is what Mani does in the qcom patch that implements the
reset_slot() callback.
2) That is what we do in:
https://github.com/torvalds/linux/blob/master/drivers/pci/controller/dwc/pcie-designware-host.c#L558-L559
(Ignore errors, the link may come up later)
Kind regards,
Niklas
Powered by blists - more mailing lists