[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAE3SzaQS66-vbwKe18i=fMGa-B+JZ4P=za=bOTutoTxOMwANKA@mail.gmail.com>
Date: Wed, 18 Jun 2025 00:54:32 +0530
From: Akshay Jindal <akshayaj.lkd@...il.com>
To: bhelgaas@...gle.com, ilpo.jarvinen@...ux.intel.com,
Jonathan.Cameron@...wei.com, sathyanarayanan.kuppuswamy@...ux.intel.com,
kwilczynski@...nel.org, mahesh@...ux.ibm.com, oohall@...il.com,
karolina.stolarek@...cle.com, lukas@...ner.de, pandoh@...gle.com
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/AER: Add Error Log in case when AER_MAX_MULTI_ERR_DEVICES
limit hit during AER handling.
On Thu, Jun 12, 2025 at 12:31 AM Akshay Jindal <akshayaj.lkd@...il.com> wrote:
>
> When an error is detected at a PCIe device and the root port receives the
> error message, the threaded IRQ handler aer_isr traverses down the
> hierarchy from the root port and keeps on adding those pcie devices on
> which error has been recorded into the e_info->dev[] array for
> respective error handling and recovery. The e_info->dev[] array has size
> AER_MAX_MULTI_ERR_DEVICES which currently has been defined as 5.
> This change adds an error message in case this limit is hit.
>
> Signed-off-by: Akshay Jindal <akshayaj.lkd@...il.com>
> ---
>
> Testing:
> ========
> Verified log in dmesg on QEMU.
>
> 1. Following command created the required environment. As mentioned below a
> pcie-root-port and a virtio-net-pci device are used on a Q35 machine model.
> ./qemu-system-x86_64 \
> -M q35,accel=kvm \
> -m 2G -cpu host -nographic \
> -serial mon:stdio \
> -kernel /home/akshayaj/pci/arch/x86/boot/bzImage \
> -initrd /home/akshayaj/Embedded_System_Using_QEMU/rootfs/rootfs.cpio.gz \
> -append "console=ttyS0 root=/ pci=pcie_scan_all" \
> -device pcie-root-port,id=rp0,chassis=1,slot=1 \
> -device virtio-net-pci,bus=rp0
>
> ~ # mylspci -t
> -[0000:00]-+-00.0
> +-01.0
> +-02.0
> +-03.0-[01]----00.0
> +-1f.0
> +-1f.2
> \-1f.3
> 00:03.0--> pcie-root-port
>
>
> 2. Kernel bzImage compiled with following changes:
> 2.1 CONFIG_PCIEAER=y in config
> 2.2 AER_MAX_MULTI_ERR_DEVICES set to 0
> Since there is no pcie-testdev in QEMU, it is impossible to create
> a 5-level hierarchy of PCIe devices in QEMU. So we simulate the
> error scenario by changing the limit to 0.
> 2.3 Log added at the required place in aer.c.
>
> 3. Both correctable and uncorrectable errors were injected on
> pcie-root-port via HMP command (pcie_aer_inject_error) in QEMU.
> HMP Command used are as follows:
> 3.1 pcie_aer_inject_error -c rp0 0x1
> 3.2 pcie_aer_inject_error -c rp0 0x40
> 3.3 pcie_aer_inject_error rp0 0x10
>
> Resulting dmesg:
> ================
> [ 0.380534] pcieport 0000:00:03.0: AER: enabled with IRQ 24
> [ 55.729530] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
> [ 225.484456] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
> [ 356.976253] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
>
> drivers/pci/pcie/aer.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 70ac66188367..3995a1db5699 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1039,7 +1039,8 @@ static int find_device_iter(struct pci_dev *dev, void *data)
> /* List this device */
> if (add_error_device(e_info, dev)) {
> /* We cannot handle more... Stop iteration */
> - /* TODO: Should print error message here? */
> + pci_err(dev, "Exceeded max allowed (%d) addition of PCIe "
> + "devices for AER handling\n", AER_MAX_MULTI_ERR_DEVICES);
> return 1;
> }
>
> --
> 2.43.0
>
Gentle reminder.
Thanks,
Akshay.
Powered by blists - more mailing lists