[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250224034500.23024-1-feng.tang@linux.alibaba.com>
Date: Mon, 24 Feb 2025 11:44:56 +0800
From: Feng Tang <feng.tang@...ux.alibaba.com>
To: Bjorn Helgaas <bhelgaas@...gle.com>,
Lukas Wunner <lukas@...ner.de>,
Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@...ux.intel.com>,
Liguang Zhang <zhangliguang@...ux.alibaba.com>,
Guanghui Feng <guanghuifeng@...ux.alibaba.com>,
rafael@...nel.org
Cc: Markus Elfring <Markus.Elfring@....de>,
lkp@...el.com,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
ilpo.jarvinen@...ux.intel.com,
linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org,
Feng Tang <feng.tang@...ux.alibaba.com>
Subject: [PATCH v3 0/4] PCIe hotplug interrupt related fixes
Hi all,
This patchset tries to address 2 PCIe hotplug interrupt related problems
we met recently:
1. Firmware developers reported that they received two PCIe hotplug commands
in very short intervals on an ARM server, which doesn't comply with PCIe
spec, and broke their state machine and work flow.
2. An irq storm bug found when testing "pci=nomsi" case, and the root
cause is: 'nomsi' will disable MSI and let devices and root ports use
legacy INTX interrupt, and likely make several devices/ports share one
interrupt. In the failure case, BIOS doesn't disable the pcie hotplug
interrupts, and actually asserts the command-complete interrupt.
More details could be found in commit log of patch 2/4 and 4/4. Basically:
Patch 0001 moves the PCIe hotplug command waiting funtion from pciehp
driver to PCIe port driver for code reuse.
Patch 0002 adds the necessary wait for PCIe hotplug command
Patch 0003 loose the condition check for interrupt disabling
Patch 0004 for msi disabled case, disable PCIe hotplug interrupt in
early boot phase
Please help to review, thanks!
- Feng
Changelog:
since v2:
* Add patch 0001, which move the waiting logic of pcie_poll_cmd from pciehp
driver to PCIe port driver for code reuse (Bjorn Helgaas)
* Separate Lucas' suggestion out as patch 0003 (Bjorn and Sathyanarayanan)
* Avoid hotplug command waiting for HW without command-complete
event support (Bjorn Helgaas)
* Fix spell issue in commit log (Bjorn and Markus)
* Add cover-letter for whole patchset (Markus Elfring)
* Handle a set-but-unused build warning (0Day lkp bot)
since v1:
* Add the Originally-by for Liguang for patch 0002. The issue was found on
a 5.10 kernel, then 6.6. I was initially given a 5.10 kernel tar ball
without git info to debug the issue, and made the patch. Thanks to Guanghui
who recently pointed me to tree https://gitee.com/anolis/cloud-kernel which
show the wait logic in 5.10 was originally from Liguang, and never hit
mainline.
* Make the irq disabling not dependent on wthether pciehp service driver
will be loaded (Lukas Wunner)
* Use read_poll_timeout() API to simply the waiting logic (Sathyanarayanan
Kuppuswamy)
* Fix wrong email address (Markus Elfring)
* Add logic to skip irq disabling if it is already disabled.
Feng Tang (4):
PCI: portdrv: pciehp: Move PCIe hotplug command waiting logic to port
driver
PCI/portdrv: Add necessary wait for disabling hotplug events
PCI/portdrv: Loose the condition check for disabling hotplug
interrupts
PCI: Disable PCIe hotplug interrupts early when msi is disabled
drivers/pci/hotplug/pciehp_hpc.c | 38 ++++++------------------
drivers/pci/pci.h | 7 +++++
drivers/pci/pcie/portdrv.c | 50 ++++++++++++++++++++++++++++----
drivers/pci/probe.c | 9 ++++++
4 files changed, 70 insertions(+), 34 deletions(-)
--
2.43.5
Powered by blists - more mailing lists