[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251104180859.00001e6d@huawei.com>
Date: Tue, 4 Nov 2025 18:08:59 +0000
From: Jonathan Cameron <jonathan.cameron@...wei.com>
To: Terry Bowman <terry.bowman@....com>
CC: <dave@...olabs.net>, <dave.jiang@...el.com>, <alison.schofield@...el.com>,
<dan.j.williams@...el.com>, <bhelgaas@...gle.com>, <shiju.jose@...wei.com>,
<ming.li@...omail.com>, <Smita.KoralahalliChannabasappa@....com>,
<rrichter@....com>, <dan.carpenter@...aro.org>,
<PradeepVineshReddy.Kodamati@....com>, <lukas@...ner.de>,
<Benjamin.Cheatham@....com>, <sathyanarayanan.kuppuswamy@...ux.intel.com>,
<linux-cxl@...r.kernel.org>, <alucerop@....com>, <ira.weiny@...el.com>,
<linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>
Subject: Re: [RESEND v13 09/25] PCI/AER: Report CXL or PCIe bus error type
in trace logging
On Tue, 4 Nov 2025 11:02:49 -0600
Terry Bowman <terry.bowman@....com> wrote:
> The AER service driver and aer_event tracing currently log 'PCIe Bus Type'
> for all errors. Update the driver and aer_event tracing to log 'CXL Bus
> Type' for CXL device errors.
>
> This requires the AER can identify and distinguish between PCIe errors and
> CXL errors.
>
> Introduce boolean 'is_cxl' to 'struct aer_err_info'. Add assignment in
> aer_get_device_error_info() and pci_print_aer().
>
> Update the aer_event trace routine to accept a bus type string parameter.
>
> Signed-off-by: Terry Bowman <terry.bowman@....com>
> Reviewed-by: Ira Weiny <ira.weiny@...el.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@...wei.com>
> Reviewed-by: Dan Williams <dan.j.williams@...el.com>
> Reviewed-by: Dave Jiang <dave.jiang@...el.com>
>
Hi Terry,
A couple of things from a fresh look inline.
> ---
>
> Changes in v12->v13:
> - Remove duplicated aer_err_info inline comments. Is already in the
> kernel-doc header (Ben)
>
> Changes in v11->v12:
> - Change aer_err_info::is_cxl to be bool a bitfield. Update structure
> padding. (Lukas)
> - Add kernel-doc for 'struct aer_err_info' (Lukas)
>
> Changes in v10->v11:
> - Remove duplicate call to trace_aer_event() (Shiju)
> - Added Dan William's and Dave Jiang's reviewed-by
> ---
> drivers/pci/pci.h | 37 ++++++++++++++++++++++++++++++-------
> drivers/pci/pcie/aer.c | 18 ++++++++++++------
> include/ras/ras_event.h | 9 ++++++---
> 3 files changed, 48 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index d23430e3eea0..446251892bb7 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -701,31 +701,54 @@ static inline bool pci_dev_binding_disallowed(struct pci_dev *dev)
>
> #define AER_MAX_MULTI_ERR_DEVICES 5 /* Not likely to have more */
>
> +/**
> + * struct aer_err_info - AER Error Information
> + * @dev: Devices reporting error
> + * @ratelimit_print: Flag to log or not log the devices' error. 0=NotLog/1=Log
> + * @error_devnum: Number of devices reporting an error
typo error_dev_num
Run kernel-doc script over here to find things like this.
> + * @level: printk level to use in logging
> + * @id: Value from register PCI_ERR_ROOT_ERR_SRC
> + * @severity: AER severity, 0-UNCOR Non-fatal, 1-UNCOR fatal, 2-COR
> + * @root_ratelimit_print: Flag to log or not log the root's error. 0=NotLog/1=Log
> + * @multi_error_valid: If multiple errors are reported
> + * @first_error: First reported error
> + * @is_cxl: Bus type error: 0-PCI Bus error, 1-CXL Bus error
> + * @tlp_header_valid: Indicates if TLP field contains error information
> + * @status: COR/UNCOR error status
> + * @mask: COR/UNCOR mask
> + * @tlp: Transaction packet information
> + */
> struct aer_err_info {
> struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
> int ratelimit_print[AER_MAX_MULTI_ERR_DEVICES];
> int error_dev_num;
> - const char *level; /* printk level */
> + const char *level;
>
> unsigned int id:16;
>
> - unsigned int severity:2; /* 0:NONFATAL | 1:FATAL | 2:COR */
> - unsigned int root_ratelimit_print:1; /* 0=skip, 1=print */
> + unsigned int severity:2;
> + unsigned int root_ratelimit_print:1;
> unsigned int __pad1:4;
> unsigned int multi_error_valid:1;
>
> unsigned int first_error:5;
> - unsigned int __pad2:2;
> + unsigned int __pad2:1;
> + bool is_cxl:1;
Stick to unsigned int for the bit field just for consistency.
> unsigned int tlp_header_valid:1;
>
> - unsigned int status; /* COR/UNCOR Error Status */
> - unsigned int mask; /* COR/UNCOR Error Mask */
> - struct pcie_tlp_log tlp; /* TLP Header */
> + unsigned int status;
> + unsigned int mask;
> + struct pcie_tlp_log tlp;
> };
Powered by blists - more mailing lists