lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAE3SzaR=tXp4nKfVC7AGzh2U2fdWqVNeNeZJs-bDws+tej-f=Q@mail.gmail.com>
Date: Wed, 25 Jun 2025 15:59:14 +0530
From: Akshay Jindal <akshayaj.lkd@...il.com>
To: bhelgaas@...gle.com, helgaas@...nel.org, mani@...nel.org, 
	manivannan.sadhasivam@...aro.org, kwilczynski@...nel.org, 
	mahesh@...ux.ibm.com, oohall@...il.com, ilpo.jarvinen@...ux.intel.com, 
	Jonathan.Cameron@...wei.com, sathyanarayanan.kuppuswamy@...ux.intel.com, 
	lukas@...ner.de
Cc: shuah@...nel.org, linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] PCI/AER: Add error message when AER_MAX_MULTI_ERR_DEVICES
 limit is hit during AER handling

Hi,
Is there any feedback on the patch?

Thanks,
Akshay
On Fri, Jun 20, 2025 at 12:21 AM Akshay Jindal <akshayaj.lkd@...il.com> wrote:
>
> When a PCIe error is detected, the root port receives the error message
> and the threaded IRQ handler, aer_isr, traverses the hierarchy downward
> from the root port. It populates the e_info->dev[] array with the PCIe
> devices that have recorded error status, so that appropriate error
> handling and recovery can be performed.
>
> The e_info->dev[] array is limited in size by AER_MAX_MULTI_ERR_DEVICES,
> which is currently defined as 5. If more than five devices report errors
> in the same event, the array silently truncates the list, and those
> extra devices are not included in the recovery flow.
>
> Emit an error message when this limit is reached, fulfilling a TODO
> comment in drivers/pci/pcie/aer.c.
> /* TODO: Should print error message here? */
>
> Signed-off-by: Akshay Jindal <akshayaj.lkd@...il.com>
> ---
>
> Changes since v1:
> - Reworded commit message in imperative mood (per Shuah’s feedback)
> - Mentioned and quoted related TODO in the message
> - Updated recipient list
>
> Testing:
> ========
> Verified log in dmesg on QEMU.
>
> 1. Following command created the required environment. As mentioned below a
> pcie-root-port and a virtio-net-pci device are used on a Q35 machine model.
> ./qemu-system-x86_64 \
>         -M q35,accel=kvm \
>         -m 2G -cpu host -nographic \
>         -serial mon:stdio \
>         -kernel /home/akshayaj/pci/arch/x86/boot/bzImage \
>         -initrd /home/akshayaj/Embedded_System_Using_QEMU/rootfs/rootfs.cpio.gz \
>         -append "console=ttyS0 root=/ pci=pcie_scan_all" \
>         -device pcie-root-port,id=rp0,chassis=1,slot=1 \
>         -device virtio-net-pci,bus=rp0
>
> ~ # mylspci -t
> -[0000:00]-+-00.0
>            +-01.0
>            +-02.0
>            +-03.0-[01]----00.0
>            +-1f.0
>            +-1f.2
>            \-1f.3
> 00:03.0--> pcie-root-port
>
> 2. Kernel bzImage compiled with following changes:
>         2.1 CONFIG_PCIEAER=y in config
>         2.2 AER_MAX_MULTI_ERR_DEVICES set to 0
>         Since there is no pcie-testdev in QEMU, it is impossible to create
>         a 5-level hierarchy of PCIe devices in QEMU. So we simulate the
>         error scenario by changing the limit to 0.
>         2.3 Log added at the required place in aer.c.
>
> 3. Both correctable and uncorrectable errors were injected on
> pcie-root-port via HMP command (pcie_aer_inject_error) in QEMU.
> HMP Command used are as follows:
>         3.1 pcie_aer_inject_error -c rp0 0x1
>         3.2 pcie_aer_inject_error -c rp0 0x40
>         3.3 pcie_aer_inject_error rp0 0x10
>
> Resulting dmesg:
> ================
> [    0.380534] pcieport 0000:00:03.0: AER: enabled with IRQ 24
> [   55.729530] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
> [  225.484456] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
> [  356.976253] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
>
>  drivers/pci/pcie/aer.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 70ac66188367..3995a1db5699 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1039,7 +1039,8 @@ static int find_device_iter(struct pci_dev *dev, void *data)
>                 /* List this device */
>                 if (add_error_device(e_info, dev)) {
>                         /* We cannot handle more... Stop iteration */
> -                       /* TODO: Should print error message here? */
> +                       pci_err(dev, "Exceeded max allowed (%d) addition of PCIe "
> +                               "devices for AER handling\n", AER_MAX_MULTI_ERR_DEVICES);
>                         return 1;
>                 }
>
> --
> 2.43.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ