[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <910f6bda-4f18-47a9-9150-8489685c857d@amd.com>
Date: Wed, 10 Sep 2025 10:26:19 -0500
From: "Bowman, Terry" <terry.bowman@....com>
To: Lukas Wunner <lukas@...ner.de>
Cc: dave@...olabs.net, jonathan.cameron@...wei.com, dave.jiang@...el.com,
alison.schofield@...el.com, dan.j.williams@...el.com, bhelgaas@...gle.com,
shiju.jose@...wei.com, ming.li@...omail.com,
Smita.KoralahalliChannabasappa@....com, rrichter@....com,
dan.carpenter@...aro.org, PradeepVineshReddy.Kodamati@....com,
Benjamin.Cheatham@....com, sathyanarayanan.kuppuswamy@...ux.intel.com,
linux-cxl@...r.kernel.org, alucerop@....com, ira.weiny@...el.com,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: [PATCH v11 09/23] PCI/AER: Report CXL or PCIe bus error type in
trace logging
On 8/27/2025 2:37 AM, Lukas Wunner wrote:
> On Tue, Aug 26, 2025 at 08:35:24PM -0500, Terry Bowman wrote:
>> The AER service driver and aer_event tracing currently log 'PCIe Bus Type'
>> for all errors. Update the driver and aer_event tracing to log 'CXL Bus
>> Type' for CXL device errors.
>>
>> This requires the AER can identify and distinguish between PCIe errors and
>> CXL errors.
>>
>> Introduce boolean 'is_cxl' to 'struct aer_err_info'. Add assignment in
>> aer_get_device_error_info() and pci_print_aer().
>>
>> Update the aer_event trace routine to accept a bus type string parameter.
> aer_print_error() has a pointer to the struct pci_dev and you've added
> an is_cxl bit to that struct in the preceding patch.
>
> Is there a reason why you can't just use that dev->is_cxl bit, in lieu of
> adding another is_cxl bit to struct aer_err_info?
>
> If so, please document it in a code comment or at least in the commit
> message. If there isn't, please use dev->is_cxl.
>
> Thanks,
>
> Lukas
Hi Lukas,
The addition of 'is_cxl' member to 'struct aer_err_info' was requested by Dan Williams
during v7 review:
https://lore.kernel.org/linux-cxl/67abe1903a8ed_2d1e2942f@dwillia2-xfh.jf.intel.com.notmuch/
My understanding is the change was requested to encapsulate the bus error
type with the actual AER status. This is helpful when considering the
actual device bus state can change between capturing the AER status and
handling/logging. An example is a training HW error. Caching the 'is_cxl' will allow
the drivers to properly identify the error bus type for further logging and
handling.
Hopefully Dan will add his thoughts here.
Regards,
Terry
Powered by blists - more mailing lists