[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99d1cd4d-2526-4ce0-aabd-508fa03cb100@linux.intel.com>
Date: Tue, 18 Feb 2025 10:58:19 -0800
From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@...ux.intel.com>
To: Feng Tang <feng.tang@...ux.alibaba.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, Lukas Wunner <lukas@...ner.de>,
Liguang Zhang <zhangliguang@...ux.alibaba.com>,
Guanghui Feng <guanghuifeng@...ux.alibaba.com>, rafael@...nel.org
Cc: Markus Elfring <Markus.Elfring@....de>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
ilpo.jarvinen@...ux.intel.com, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] PCI/portdrv: Add necessary wait for disabling
hotplug events
On 2/17/25 7:48 PM, Feng Tang wrote:
> There was problem reported by firmware developers that they received
> 2 pcie link control commands in very short intervals on an ARM server,
> which doesn't comply with pcie spec, and broke their state machine and
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> to wait at least 1 second for the command-complete event, before
> resending the cmd or sending a new cmd.
>
> And the first link control command firmware received is from
> get_port_device_capability(), which sends cmd to disable pcie hotplug
> interrupts without waiting for its completion.
Were you able to narrow down the source of the second command? The
reason I am asking is, the commit you are trying to fix seems to have
existed for 10+ years and no one had faced any issues with it. So
I am wondering whether this needs to fixed at this place or before
executing the second command.
>
> Fix it by adding the necessary wait to comply with PCIe spec, referring
> pcie_poll_cmd().
>
> Also make the interrupt disabling not dependent on whether pciehp
> service driver will be loaded as suggested by Lukas.
May be this needs a new patch?
>
> Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> Originally-by: Liguang Zhang <zhangliguang@...ux.alibaba.com>
> Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@...ux.intel.com>
> Signed-off-by: Feng Tang <feng.tang@...ux.alibaba.com>
> ---
Code wise it looks fine to me.
> Changlog:
>
> since v1:
> * Add the Originally-by for Liguang. The issue was found on a 5.10 kernel,
> then 6.6. I was initially given a 5.10 kernel tar bar without git info to
> debug the issue, and made the patch. Thanks to Guanghui who recently pointed
> me to tree https://gitee.com/anolis/cloud-kernel which show the wait logic
> in 5.10 was originally from Liguang, and never hit mainline.
> * Make the irq disabling not dependent on wthether pciehp service driver
> will be loaded (Lukas Wunner)
> * Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan
> Kuppuswamy)
> * Add logic to skip irq disabling if it is already disabled.
>
> drivers/pci/pci.h | 2 ++
> drivers/pci/pcie/portdrv.c | 44 +++++++++++++++++++++++++++++++++-----
> 2 files changed, 41 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 01e51db8d285..c1e234d1b81d 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -759,12 +759,14 @@ static inline void pcie_ecrc_get_policy(char *str) { }
> #ifdef CONFIG_PCIEPORTBUS
> void pcie_reset_lbms_count(struct pci_dev *port);
> int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
> +void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
> #else
> static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
> static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> {
> return -EOPNOTSUPP;
> }
> +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
> #endif
>
> struct pci_dev_reset_methods {
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 02e73099bad0..2470333bba2f 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -18,6 +18,7 @@
> #include <linux/string.h>
> #include <linux/slab.h>
> #include <linux/aer.h>
> +#include <linux/iopoll.h>
>
> #include "../pci.h"
> #include "portdrv.h"
> @@ -205,6 +206,40 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
> return 0;
> }
>
> +static int pcie_wait_sltctl_cmd_raw(struct pci_dev *dev)
> +{
> + u16 slot_status = 0;
> + int ret, ret1, timeout_us;
> +
> + /* 1 second, according to PCIe spec 6.1, section 6.7.3.2 */
> + timeout_us = 1000000;
> + ret = read_poll_timeout(pcie_capability_read_word, ret1,
> + (slot_status & PCI_EXP_SLTSTA_CC), 10000,
> + timeout_us, true, dev, PCI_EXP_SLTSTA,
> + &slot_status);
> + if (!ret)
> + pcie_capability_write_word(dev, PCI_EXP_SLTSTA,
> + PCI_EXP_SLTSTA_CC);
> +
> + return ret;
> +}
> +
> +void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
> +{
> + u16 slot_ctrl = 0;
> +
> + pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
> + /* Bail out early if it is already disabled */
> + if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
> + return;
> +
> + pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> + PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +
> + if (pcie_wait_sltctl_cmd_raw(dev))
> + pci_info(dev, "Timeout on disabling PCIE hot-plug interrupt\n");
> +}
> +
> /**
> * get_port_device_capability - discover capabilities of a PCI Express port
> * @dev: PCI Express port to examine
> @@ -222,16 +257,15 @@ static int get_port_device_capability(struct pci_dev *dev)
>
> if (dev->is_hotplug_bridge &&
> (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> - pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) &&
> - (pcie_ports_native || host->native_pcie_hotplug)) {
> - services |= PCIE_PORT_SERVICE_HP;
> + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> + if (pcie_ports_native || host->native_pcie_hotplug)
> + services |= PCIE_PORT_SERVICE_HP;
>
> /*
> * Disable hot-plug interrupts in case they have been enabled
> * by the BIOS and the hot-plug service driver is not loaded.
> */
> - pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> - PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> + pcie_disable_hp_interrupts_early(dev);
> }
>
> #ifdef CONFIG_PCIEAER
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
Powered by blists - more mailing lists