[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250611190056.355878-1-akshayaj.lkd@gmail.com>
Date: Thu, 12 Jun 2025 00:30:53 +0530
From: Akshay Jindal <akshayaj.lkd@...il.com>
To: bhelgaas@...gle.com,
ilpo.jarvinen@...ux.intel.com,
Jonathan.Cameron@...wei.com,
sathyanarayanan.kuppuswamy@...ux.intel.com,
kwilczynski@...nel.org,
mahesh@...ux.ibm.com,
oohall@...il.com,
karolina.stolarek@...cle.com,
lukas@...ner.de,
pandoh@...gle.com
Cc: Akshay Jindal <akshayaj.lkd@...il.com>,
linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [PATCH] PCI/AER: Add Error Log in case when AER_MAX_MULTI_ERR_DEVICES limit hit during AER handling.
When an error is detected at a PCIe device and the root port receives the
error message, the threaded IRQ handler aer_isr traverses down the
hierarchy from the root port and keeps on adding those pcie devices on
which error has been recorded into the e_info->dev[] array for
respective error handling and recovery. The e_info->dev[] array has size
AER_MAX_MULTI_ERR_DEVICES which currently has been defined as 5.
This change adds an error message in case this limit is hit.
Signed-off-by: Akshay Jindal <akshayaj.lkd@...il.com>
---
Testing:
========
Verified log in dmesg on QEMU.
1. Following command created the required environment. As mentioned below a
pcie-root-port and a virtio-net-pci device are used on a Q35 machine model.
./qemu-system-x86_64 \
-M q35,accel=kvm \
-m 2G -cpu host -nographic \
-serial mon:stdio \
-kernel /home/akshayaj/pci/arch/x86/boot/bzImage \
-initrd /home/akshayaj/Embedded_System_Using_QEMU/rootfs/rootfs.cpio.gz \
-append "console=ttyS0 root=/ pci=pcie_scan_all" \
-device pcie-root-port,id=rp0,chassis=1,slot=1 \
-device virtio-net-pci,bus=rp0
~ # mylspci -t
-[0000:00]-+-00.0
+-01.0
+-02.0
+-03.0-[01]----00.0
+-1f.0
+-1f.2
\-1f.3
00:03.0--> pcie-root-port
2. Kernel bzImage compiled with following changes:
2.1 CONFIG_PCIEAER=y in config
2.2 AER_MAX_MULTI_ERR_DEVICES set to 0
Since there is no pcie-testdev in QEMU, it is impossible to create
a 5-level hierarchy of PCIe devices in QEMU. So we simulate the
error scenario by changing the limit to 0.
2.3 Log added at the required place in aer.c.
3. Both correctable and uncorrectable errors were injected on
pcie-root-port via HMP command (pcie_aer_inject_error) in QEMU.
HMP Command used are as follows:
3.1 pcie_aer_inject_error -c rp0 0x1
3.2 pcie_aer_inject_error -c rp0 0x40
3.3 pcie_aer_inject_error rp0 0x10
Resulting dmesg:
================
[ 0.380534] pcieport 0000:00:03.0: AER: enabled with IRQ 24
[ 55.729530] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
[ 225.484456] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
[ 356.976253] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
drivers/pci/pcie/aer.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 70ac66188367..3995a1db5699 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1039,7 +1039,8 @@ static int find_device_iter(struct pci_dev *dev, void *data)
/* List this device */
if (add_error_device(e_info, dev)) {
/* We cannot handle more... Stop iteration */
- /* TODO: Should print error message here? */
+ pci_err(dev, "Exceeded max allowed (%d) addition of PCIe "
+ "devices for AER handling\n", AER_MAX_MULTI_ERR_DEVICES);
return 1;
}
--
2.43.0
Powered by blists - more mailing lists