lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAE3SzaQS66-vbwKe18i=fMGa-B+JZ4P=za=bOTutoTxOMwANKA@mail.gmail.com>
Date: Wed, 18 Jun 2025 00:54:32 +0530
From: Akshay Jindal <akshayaj.lkd@...il.com>
To: bhelgaas@...gle.com, ilpo.jarvinen@...ux.intel.com, 
	Jonathan.Cameron@...wei.com, sathyanarayanan.kuppuswamy@...ux.intel.com, 
	kwilczynski@...nel.org, mahesh@...ux.ibm.com, oohall@...il.com, 
	karolina.stolarek@...cle.com, lukas@...ner.de, pandoh@...gle.com
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/AER: Add Error Log in case when AER_MAX_MULTI_ERR_DEVICES
 limit hit during AER handling.

On Thu, Jun 12, 2025 at 12:31 AM Akshay Jindal <akshayaj.lkd@...il.com> wrote:
>
> When an error is detected at a PCIe device and the root port receives the
> error message, the threaded IRQ handler aer_isr traverses down the
> hierarchy from the root port and keeps on adding those pcie devices on
> which error has been recorded into the e_info->dev[] array for
> respective error handling and recovery. The e_info->dev[] array has size
> AER_MAX_MULTI_ERR_DEVICES which currently has been defined as 5.
> This change adds an error message in case this limit is hit.
>
> Signed-off-by: Akshay Jindal <akshayaj.lkd@...il.com>
> ---
>
> Testing:
> ========
> Verified log in dmesg on QEMU.
>
> 1. Following command created the required environment. As mentioned below a
> pcie-root-port and a virtio-net-pci device are used on a Q35 machine model.
> ./qemu-system-x86_64 \
>         -M q35,accel=kvm \
>         -m 2G -cpu host -nographic \
>         -serial mon:stdio \
>         -kernel /home/akshayaj/pci/arch/x86/boot/bzImage \
>         -initrd /home/akshayaj/Embedded_System_Using_QEMU/rootfs/rootfs.cpio.gz \
>         -append "console=ttyS0 root=/ pci=pcie_scan_all" \
>         -device pcie-root-port,id=rp0,chassis=1,slot=1 \
>         -device virtio-net-pci,bus=rp0
>
> ~ # mylspci -t
> -[0000:00]-+-00.0
>            +-01.0
>            +-02.0
>            +-03.0-[01]----00.0
>            +-1f.0
>            +-1f.2
>            \-1f.3
> 00:03.0--> pcie-root-port
>
>
> 2. Kernel bzImage compiled with following changes:
>         2.1 CONFIG_PCIEAER=y in config
>         2.2 AER_MAX_MULTI_ERR_DEVICES set to 0
>         Since there is no pcie-testdev in QEMU, it is impossible to create
>         a 5-level hierarchy of PCIe devices in QEMU. So we simulate the
>         error scenario by changing the limit to 0.
>         2.3 Log added at the required place in aer.c.
>
> 3. Both correctable and uncorrectable errors were injected on
> pcie-root-port via HMP command (pcie_aer_inject_error) in QEMU.
> HMP Command used are as follows:
>         3.1 pcie_aer_inject_error -c rp0 0x1
>         3.2 pcie_aer_inject_error -c rp0 0x40
>         3.3 pcie_aer_inject_error rp0 0x10
>
> Resulting dmesg:
> ================
> [    0.380534] pcieport 0000:00:03.0: AER: enabled with IRQ 24
> [   55.729530] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
> [  225.484456] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
> [  356.976253] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
>
>  drivers/pci/pcie/aer.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 70ac66188367..3995a1db5699 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1039,7 +1039,8 @@ static int find_device_iter(struct pci_dev *dev, void *data)
>                 /* List this device */
>                 if (add_error_device(e_info, dev)) {
>                         /* We cannot handle more... Stop iteration */
> -                       /* TODO: Should print error message here? */
> +                       pci_err(dev, "Exceeded max allowed (%d) addition of PCIe "
> +                               "devices for AER handling\n", AER_MAX_MULTI_ERR_DEVICES);
>                         return 1;
>                 }
>
> --
> 2.43.0
>
Gentle reminder.

Thanks,
Akshay.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ