lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250829211801.GA1025641@bhelgaas>
Date: Fri, 29 Aug 2025 16:18:01 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Zhenzhong Duan <zhenzhong.duan@...el.com>
Cc: linux-pci@...r.kernel.org, bhelgaas@...gle.com, mahesh@...ux.ibm.com,
	oohall@...il.com, linuxppc-dev@...ts.ozlabs.org,
	linux-acpi@...r.kernel.org, rafael@...nel.org, lenb@...nel.org,
	james.morse@....com, tony.luck@...el.com, bp@...en8.de,
	dave@...olabs.net, jonathan.cameron@...wei.com,
	dave.jiang@...el.com, alison.schofield@...el.com,
	vishal.l.verma@...el.com, ira.weiny@...el.com, linmiaohe@...wei.com,
	shiju.jose@...wei.com, adam.c.preble@...el.com, lukas@...ner.de,
	Smita.KoralahalliChannabasappa@....com, rrichter@....com,
	linux-cxl@...r.kernel.org, linux-edac@...r.kernel.org,
	linux-kernel@...r.kernel.org, erwin.tsaur@...el.com,
	sathyanarayanan.kuppuswamy@...el.com, dan.j.williams@...el.com,
	feiting.wanyan@...el.com, yudong.wang@...el.com,
	chao.p.peng@...el.com, qingshun.wang@...ux.intel.com,
	Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>,
	Matthew W Carlis <mattc@...estorage.com>
Subject: Re: [PATCH v5 2/2] PCI/AER: Print UNCOR_STATUS bits that might be
 ANFE

[+cc Matt]

On Thu, Jun 20, 2024 at 10:58:57AM +0800, Zhenzhong Duan wrote:
> When an Advisory Non-Fatal error(ANFE) triggers, both correctable error(CE)
> status and ANFE related uncorrectable error(UE) status will be printed:
> 
>   AER: Correctable error message received from 0000:b7:02.0
>   PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
>     device [8086:0db0] error status/mask=00002000/00000000
>      [13] NonFatalErr
>     Uncorrectable errors that may cause Advisory Non-Fatal:
>      [12] TLP
> 
> Tested-by: Yudong Wang <yudong.wang@...el.com>
> Co-developed-by: "Wang, Qingshun" <qingshun.wang@...ux.intel.com>
> Signed-off-by: "Wang, Qingshun" <qingshun.wang@...ux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@...el.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
> ---
>  drivers/pci/pcie/aer.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 3dcfa0191169..ba3a54092f2c 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -681,6 +681,7 @@ static void __aer_print_error(struct pci_dev *dev,
>  {
>  	const char **strings;
>  	unsigned long status = info->status & ~info->mask;
> +	unsigned long anfe_status = info->anfe_status;
>  	const char *level, *errmsg;
>  	int i;
>  
> @@ -701,6 +702,20 @@ static void __aer_print_error(struct pci_dev *dev,
>  				info->first_error == i ? " (First)" : "");
>  	}
>  	pci_dev_aer_stats_incr(dev, info);
> +
> +	if (!anfe_status)
> +		return;

__aer_print_error() is used by both native AER handling, where Linux
fields the AER interrupt and reads the AER status registers directly,
and APEI GHES firmware-first error handling, where platform firmware
fields the AER interrupt, reads the AER status registers, and packages
them up to hand off to Linux via aer_recover_queue().

But the previous patch only sets info->anfe_status for the native
path, so the APEI GHES path doesn't get the benefit of this change.

I think both paths should log the same ANFE information.

> +
> +	strings = aer_uncorrectable_error_string;
> +	pci_printk(level, dev, "Uncorrectable errors that may cause Advisory Non-Fatal:\n");
> +
> +	for_each_set_bit(i, &anfe_status, 32) {
> +		errmsg = strings[i];
> +		if (!errmsg)
> +			errmsg = "Unknown Error Bit";
> +
> +		pci_printk(level, dev, "   [%2d] %s\n", i, errmsg);
> +	}
>  }
>  
>  void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
> -- 
> 2.34.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ