[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4a69aa48-bf42-2815-863c-0caefdb23c68@amd.com>
Date: Wed, 3 Jan 2024 13:13:53 -0800
From: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
To: Ira Weiny <ira.weiny@...el.com>, linux-efi@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-cxl@...r.kernel.org
Cc: Ard Biesheuvel <ardb@...nel.org>,
Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Yazen Ghannam <yazen.ghannam@....com>
Subject: Re: [PATCH 4/4] acpi/ghes, cxl/pci: Trace FW-First CXL Protocol
Errors
On 1/2/2024 12:27 PM, Ira Weiny wrote:
> Smita Koralahalli wrote:
>> When PCIe AER is in FW-First, OS should process CXL Protocol errors from
>> CPER records. These CPER records obtained from GHES module, will rely on
>> a registered callback to be notified to the CXL subsystem in order to be
>> processed.
>>
>> Call the existing cxl_cper_callback to notify the CXL subsystem on a
>> Protocol error.
>>
>> The defined trace events cxl_aer_uncorrectable_error and
>> cxl_aer_correctable_error currently trace native CXL AER errors. Reuse
>> them to trace FW-First Protocol Errors.
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
>
> [snip]
>
>> int cxl_cper_register_callback(cxl_cper_callback callback)
>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
>> index 37e1652afbc7..da516982a625 100644
>> --- a/drivers/cxl/core/pci.c
>> +++ b/drivers/cxl/core/pci.c
>> @@ -6,6 +6,7 @@
>> #include <linux/pci.h>
>> #include <linux/pci-doe.h>
>> #include <linux/aer.h>
>> +#include <linux/cper.h>
>> #include <cxlpci.h>
>> #include <cxlmem.h>
>> #include <cxl.h>
>> @@ -836,6 +837,51 @@ void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport)
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_setup_parent_dport, CXL);
>>
>> +#define CXL_AER_UNCORRECTABLE 0
>> +#define CXL_AER_CORRECTABLE 1
>
> Better defined as an enum?
Will change.
>
>> +
>> +int cper_severity_cxl_aer(int cper_severity)
>
> My gut says that it would be better to hide this conversion in the
> GHES/CPER code and send a more generic defined CXL_AER_* severity through.
Ok will change.
>
>> +{
>> + switch (cper_severity) {
>> + case CPER_SEV_RECOVERABLE:
>> + case CPER_SEV_FATAL:
>> + return CXL_AER_UNCORRECTABLE;
>> + default:
>> + return CXL_AER_CORRECTABLE;
>> + }
>> +}
>> +
>> +void cxl_prot_err_trace_record(struct cxl_dev_state *cxlds,
>> + struct cxl_cper_rec_data *data)
>> +{
>> + struct cper_cxl_event_sn *dev_serial_num = &data->rec.hdr.dev_serial_num;
>> + u32 status, fe;
>> + int severity;
>> +
>> + severity = cper_severity_cxl_aer(data->severity);
>> +
>> + cxlds->serial = (((u64)dev_serial_num->upper_dw << 32) |
>> + dev_serial_num->lower_dw);
>
> This permanently overwrites the serial number read from PCI...
>
> If the serial number does not match up or was not valid (per the check in
> the previous patch) lets add a warning.
Sure will add.
Thanks,
Smita
>
> AFAICT they should match.
>
> Ira
>
> [snip]
>
Powered by blists - more mailing lists