lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250619185041.73240-1-akshayaj.lkd@gmail.com>
Date: Fri, 20 Jun 2025 00:20:30 +0530
From: Akshay Jindal <akshayaj.lkd@...il.com>
To: bhelgaas@...gle.com,
	helgaas@...nel.org,
	mani@...nel.org,
	manivannan.sadhasivam@...aro.org,
	kwilczynski@...nel.org,
	mahesh@...ux.ibm.com,
	oohall@...il.com,
	ilpo.jarvinen@...ux.intel.com,
	Jonathan.Cameron@...wei.com,
	sathyanarayanan.kuppuswamy@...ux.intel.com,
	lukas@...ner.de
Cc: Akshay Jindal <akshayaj.lkd@...il.com>,
	shuah@...nel.org,
	linux-pci@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: [PATCH v2] PCI/AER: Add error message when AER_MAX_MULTI_ERR_DEVICES limit is hit during AER handling

When a PCIe error is detected, the root port receives the error message
and the threaded IRQ handler, aer_isr, traverses the hierarchy downward
from the root port. It populates the e_info->dev[] array with the PCIe
devices that have recorded error status, so that appropriate error
handling and recovery can be performed.

The e_info->dev[] array is limited in size by AER_MAX_MULTI_ERR_DEVICES,
which is currently defined as 5. If more than five devices report errors
in the same event, the array silently truncates the list, and those
extra devices are not included in the recovery flow.

Emit an error message when this limit is reached, fulfilling a TODO
comment in drivers/pci/pcie/aer.c.
/* TODO: Should print error message here? */

Signed-off-by: Akshay Jindal <akshayaj.lkd@...il.com>
---

Changes since v1:
- Reworded commit message in imperative mood (per Shuah’s feedback)
- Mentioned and quoted related TODO in the message
- Updated recipient list

Testing:
========
Verified log in dmesg on QEMU.

1. Following command created the required environment. As mentioned below a
pcie-root-port and a virtio-net-pci device are used on a Q35 machine model.
./qemu-system-x86_64 \
	-M q35,accel=kvm \
	-m 2G -cpu host -nographic \
	-serial mon:stdio \
	-kernel /home/akshayaj/pci/arch/x86/boot/bzImage \
	-initrd /home/akshayaj/Embedded_System_Using_QEMU/rootfs/rootfs.cpio.gz \
	-append "console=ttyS0 root=/ pci=pcie_scan_all" \
	-device pcie-root-port,id=rp0,chassis=1,slot=1 \
	-device virtio-net-pci,bus=rp0

~ # mylspci -t
-[0000:00]-+-00.0
           +-01.0
           +-02.0
           +-03.0-[01]----00.0
           +-1f.0
           +-1f.2
           \-1f.3
00:03.0--> pcie-root-port

2. Kernel bzImage compiled with following changes:
	2.1 CONFIG_PCIEAER=y in config
	2.2 AER_MAX_MULTI_ERR_DEVICES set to 0
	Since there is no pcie-testdev in QEMU, it is impossible to create
	a 5-level hierarchy of PCIe devices in QEMU. So we simulate the
	error scenario by changing the limit to 0.
	2.3 Log added at the required place in aer.c.

3. Both correctable and uncorrectable errors were injected on
pcie-root-port via HMP command (pcie_aer_inject_error) in QEMU.
HMP Command used are as follows:
	3.1 pcie_aer_inject_error -c rp0 0x1
	3.2 pcie_aer_inject_error -c rp0 0x40
	3.3 pcie_aer_inject_error rp0 0x10

Resulting dmesg:
================
[    0.380534] pcieport 0000:00:03.0: AER: enabled with IRQ 24
[   55.729530] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
[  225.484456] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
[  356.976253] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling

 drivers/pci/pcie/aer.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 70ac66188367..3995a1db5699 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1039,7 +1039,8 @@ static int find_device_iter(struct pci_dev *dev, void *data)
 		/* List this device */
 		if (add_error_device(e_info, dev)) {
 			/* We cannot handle more... Stop iteration */
-			/* TODO: Should print error message here? */
+			pci_err(dev, "Exceeded max allowed (%d) addition of PCIe "
+				"devices for AER handling\n", AER_MAX_MULTI_ERR_DEVICES);
 			return 1;
 		}
 
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ