[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6717dc2ec6c90_231229487@dwillia2-xfh.jf.intel.com.notmuch>
Date: Tue, 22 Oct 2024 10:09:02 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Terry Bowman <Terry.Bowman@....com>, Dan Williams
<dan.j.williams@...el.com>, <ming4.li@...el.com>,
<linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-pci@...r.kernel.org>, <dave@...olabs.net>,
<jonathan.cameron@...wei.com>, <dave.jiang@...el.com>,
<alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
<bhelgaas@...gle.com>, <mahesh@...ux.ibm.com>, <oohall@...il.com>,
<Benjamin.Cheatham@....com>, <rrichter@....com>, <nathan.fontenot@....com>,
<smita.koralahallichannabasappa@....com>
Subject: Re: [PATCH 01/15] cxl/aer/pci: Add CXL PCIe port error handler
callbacks in AER service driver
Terry Bowman wrote:
[..]
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index 6af5e0425872..42db26195bda 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -793,6 +793,7 @@ static struct pci_driver pcie_portdriver = {
> > .shutdown = pcie_portdrv_shutdown,
> >
> > .err_handler = &pcie_portdrv_err_handler,
> > + .cxl_err_handler = &cxl_portdrv_err_handler,
> >
> > .driver_managed_dma = true,
>
> Ok. I'm thinking to add a definition for 'pci_dev::cxl_err_handler' of type
> 'struct pci_error_handler'.
>
> 'struct pci_error_handler' contains a slot reset(), resume(), and mmio_enabled() fn
> pointers that are used in PCIe recovery if available. The plan is for CXL devices to
> call panic for UCE fatal and non-fatal but it might be good to use the
> 'struct pci_error_handler' type in case there are needs for the other handlers in
> the future. It also makes the logic to access and use the error handlers common,
> requiring less code.
Can you give an example where CXL can reuse 'struct pci_error_handlers`
infrastructure? The PCI error handlers are built around the idea that
operations can be paused and recovered, CXL operations assume near
constant device participation in CPU cache and memory coherency
protocol.
About the only reuse I can think of is cases where a CXL error could be
sent down the PCI error handler path, i.e. ones that would send a
'pci_channel_io_normal' notice to ->error_detected(). Otherwise,
pci_channel_state_t and pci_ers_result_t seem to be a poor fit for CXL
error handling.
Powered by blists - more mailing lists