[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b3fea77e-1103-40ab-b7f2-adc545273be6@amd.com>
Date: Tue, 22 Oct 2024 13:40:20 -0500
From: Terry Bowman <Terry.Bowman@....com>
To: Dan Williams <dan.j.williams@...el.com>, ming4.li@...el.com,
linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, dave@...olabs.net, jonathan.cameron@...wei.com,
dave.jiang@...el.com, alison.schofield@...el.com, vishal.l.verma@...el.com,
bhelgaas@...gle.com, mahesh@...ux.ibm.com, oohall@...il.com,
Benjamin.Cheatham@....com, rrichter@....com, nathan.fontenot@....com,
smita.koralahallichannabasappa@....com
Subject: Re: [PATCH 01/15] cxl/aer/pci: Add CXL PCIe port error handler
callbacks in AER service driver
Hi Dan,
On 10/22/24 12:09, Dan Williams wrote:
> Terry Bowman wrote:
> [..]
>>> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
>>> index 6af5e0425872..42db26195bda 100644
>>> --- a/drivers/pci/pcie/portdrv.c
>>> +++ b/drivers/pci/pcie/portdrv.c
>>> @@ -793,6 +793,7 @@ static struct pci_driver pcie_portdriver = {
>>> .shutdown = pcie_portdrv_shutdown,
>>>
>>> .err_handler = &pcie_portdrv_err_handler,
>>> + .cxl_err_handler = &cxl_portdrv_err_handler,
>>>
>>> .driver_managed_dma = true,
>>
>> Ok. I'm thinking to add a definition for 'pci_dev::cxl_err_handler' of type
>> 'struct pci_error_handler'.
>>
>> 'struct pci_error_handler' contains a slot reset(), resume(), and mmio_enabled() fn
>> pointers that are used in PCIe recovery if available. The plan is for CXL devices to
>> call panic for UCE fatal and non-fatal but it might be good to use the
>> 'struct pci_error_handler' type in case there are needs for the other handlers in
>> the future. It also makes the logic to access and use the error handlers common,
>> requiring less code.
>
> Can you give an example where CXL can reuse 'struct pci_error_handlers`
> infrastructure? The PCI error handlers are built around the idea that
> operations can be paused and recovered, CXL operations assume near
> constant device participation in CPU cache and memory coherency
> protocol.
>
> About the only reuse I can think of is cases where a CXL error could be
> sent down the PCI error handler path, i.e. ones that would send a
> 'pci_channel_io_normal' notice to ->error_detected(). Otherwise,
> pci_channel_state_t and pci_ers_result_t seem to be a poor fit for CXL
> error handling.
I was referring to reusing separate instance of 'struct pci_error_handlers' for CXL
UCE-CE errors.
One example where it can be reused in infrastructure is in err.c's
report_error_detected(). If both PCIe and CXL errors use 'struct pci_error_handlers'
then the updated report_error_detected() becomes a bit simpler with less helper
function logic. But, it's not a reason by itself to choose to reuse 'struct
pci_error_handlers' for CXL errors.
Looking closer at aer,c shows there is no advantage in this file for using 'struct
pci_error_handlers' for CXL errors.
If I understand correctly you want a new type introduced, 'struct cxl_error_handlers'.
And will contain 2 function pointers for CE and UCE handling.
Regards,
Terry
Powered by blists - more mailing lists