lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9035be0b-7102-4abc-a21a-61648211be55@amd.com>
Date: Wed, 12 Feb 2025 13:15:01 -0600
From: "Bowman, Terry" <terry.bowman@....com>
To: Dan Williams <dan.j.williams@...el.com>, linux-cxl@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
 nifan.cxl@...il.com, dave@...olabs.net, jonathan.cameron@...wei.com,
 dave.jiang@...el.com, alison.schofield@...el.com, vishal.l.verma@...el.com,
 bhelgaas@...gle.com, mahesh@...ux.ibm.com, ira.weiny@...el.com,
 oohall@...il.com, Benjamin.Cheatham@....com, rrichter@....com,
 nathan.fontenot@....com, Smita.KoralahalliChannabasappa@....com,
 lukas@...ner.de, ming.li@...omail.com, PradeepVineshReddy.Kodamati@....com
Subject: Re: [PATCH v7 04/17] PCI/AER: Modify AER driver logging to report CXL
 or PCIe bus error type



On 2/11/2025 5:47 PM, Dan Williams wrote:
> Terry Bowman wrote:
>> The AER driver and aer_event tracing currently log 'PCIe Bus Type'
>> for all errors.
>>
>> Update the driver and aer_event tracing to log 'CXL Bus Type' for CXL
>> device errors.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@....com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@...el.com>
>> Reviewed-by: Fan Ni <fan.ni@...sung.com>
>> Reviewed-by: Ira Weiny <ira.weiny@...el.com>
>> Reviewed-by: Gregory Price <gourry@...rry.net>
>> ---
>>  drivers/pci/pcie/aer.c  | 14 ++++++++------
>>  include/ras/ras_event.h |  9 ++++++---
>>  2 files changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index 6e8de77d0fc4..f99a1c6fb274 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -694,13 +694,14 @@ static void __aer_print_error(struct pci_dev *dev,
>>  
>>  void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
>>  {
>> +	const char *bus_type = pcie_is_cxl(dev) ? "CXL"  : "PCIe";
> I was expecting that this would be more than just a CXL link check
> because CXL AER events are only a subset of the events that can trigger
> an AER interrupt on a CXL device.
>
> Also, with CXL protocol errors the TLP log is sourced from CXL RAS
> registers and is distinct from the PCIe ->tlp in 'struct aer_err_info'.
>
> All that to say that I think this patch probably wants to decorate the
> bus type in 'struct aer_err_info' and then use that rather than just the ->is_cxl
> flag of the device.
>
> In the interest of moving the patch set along perhaps just do something
> like this for now and circle back to make it more sophisticated later:
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 01e51db8d285..eed098c134a6 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -533,6 +533,7 @@ static inline bool pci_dev_test_and_set_removed(struct pci_dev *dev)
>  struct aer_err_info {
>         struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
>         int error_dev_num;
> +       bool is_cxl;
>  
>         unsigned int id:16;
>  
> @@ -549,6 +550,11 @@ struct aer_err_info {
>         struct pcie_tlp_log tlp;        /* TLP Header */
>  };
>  
> +static inline const char *aer_err_bus(struct aer_err_info *info)
> +{
> +       return info->is_cxl ? "CXL" : "PCIe";
> +}
> +
>  int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info);
>  void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
>  
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 508474e17183..405f15c878ff 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1211,6 +1211,7 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
>         /* Must reset in this function */
>         info->status = 0;
>         info->tlp_header_valid = 0;
> +       info->is_cxl = dev->is_cxl;
>  
>         /* The device might not support AER */
>         if (!aer)
>
> Other than that, this patch looks good to me:
>
> Reviewed-by: Dan Williams <dan.j.williams@...el.com>
Ok. I can add is_cxl to 'struct aer_err_info'. Shall I set it by reading the
alternate protocol link state?

Terry



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ