[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250107113240.00003eda@huawei.com>
Date: Tue, 7 Jan 2025 11:32:40 +0000
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: "Bowman, Terry" <terry.bowman@....com>
CC: <linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-pci@...r.kernel.org>, <nifan.cxl@...il.com>, <dave@...olabs.net>,
<dave.jiang@...el.com>, <alison.schofield@...el.com>,
<vishal.l.verma@...el.com>, <dan.j.williams@...el.com>,
<bhelgaas@...gle.com>, <mahesh@...ux.ibm.com>, <ira.weiny@...el.com>,
<oohall@...il.com>, <Benjamin.Cheatham@....com>, <rrichter@....com>,
<nathan.fontenot@....com>, <Smita.KoralahalliChannabasappa@....com>,
<lukas@...ner.de>, <PradeepVineshReddy.Kodamati@....com>, Li Ming
<ming.li@...omail.com>
Subject: Re: [PATCH v4 14/15] cxl/pci: Add support to assign and clear
pci_driver::cxl_err_handlers
On Thu, 26 Dec 2024 11:07:13 -0600
"Bowman, Terry" <terry.bowman@....com> wrote:
> On 12/24/2024 12:50 PM, Jonathan Cameron wrote:
> > On Wed, 11 Dec 2024 17:40:01 -0600
> > Terry Bowman <terry.bowman@....com> wrote:
> >
> >> pci_driver::cxl_err_handlers are not currently assigned handler callbacks.
> >> The handlers can't be set in the pci_driver static definition because the
> >> CXL PCIe Port devices are bound to the portdrv driver which is not CXL
> >> driver aware.
> >>
> >> Add cxl_assign_port_error_handlers() in the cxl_core module. This
> >> function will assign the default handlers for a CXL PCIe Port device.
> >>
> >> When the CXL Port (cxl_port or cxl_dport) is destroyed the device's
> >> pci_driver::cxl_err_handlers must be set to NULL indicating they should no
> >> longer be used.
> >>
> >> Create cxl_clear_port_error_handlers() and register it to be called
> >> when the CXL Port device (cxl_port or cxl_dport) is destroyed.
> >>
> >> Signed-off-by: Terry Bowman <terry.bowman@....com>
> >> ---
> >> drivers/cxl/core/pci.c | 40 ++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 40 insertions(+)
> >>
> >> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> >> index 3294ad5ff28f..9734a4c55b29 100644
> >> --- a/drivers/cxl/core/pci.c
> >> +++ b/drivers/cxl/core/pci.c
> >> @@ -841,8 +841,38 @@ static bool cxl_port_error_detected(struct pci_dev *pdev)
> >> return __cxl_handle_ras(&pdev->dev, ras_base);
> >> }
> >>
> >> +static const struct cxl_error_handlers cxl_port_error_handlers = {
> >> + .error_detected = cxl_port_error_detected,
> >> + .cor_error_detected = cxl_port_cor_error_detected,
> >> +};
> >> +
> >> +static void cxl_assign_port_error_handlers(struct pci_dev *pdev)
> >> +{
> >> + struct pci_driver *pdrv;
> >> +
> >> + if (!pdev || !pdev->driver)
> >> + return;
> >> +
> >> + pdrv = pdev->driver;
> > What stops a race here? It's fiddly to remove that driver but
> > it can be done. At least I think we are messing withe portdrv
> > but this is such a fiddly stack I'm not 100% sure.
> >
> >> + pdrv->cxl_err_handler = &cxl_port_error_handlers;
> >> +}
> >> +
> >> +static void cxl_clear_port_error_handlers(void *data)
> >> +{
> >> + struct pci_dev *pdev = data;
> >> + struct pci_driver *pdrv;
> >> +
> >> + if (!pdev || !pdev->driver)
> >> + return;
> >> +
> >> + pdrv = pdev->driver;
> > Likewise. Smells like a possible race.
> >
> >> + pdrv->cxl_err_handler = NULL;
> >> +}
> >> +
>
> I can add a get_device()/put_device() for both cxl_clear_port_error_handlers() and cxl_assign_port_error_handlers() to prevent operating on a recently destroyed pci_dev. Is that sufficient? Regards, Terry
Probably (by which I mean I think it is, but haven't checked in detail)
Jonathan
> >> void cxl_uport_init_ras_reporting(struct cxl_port *port)
> >> {
> >> + struct pci_dev *pdev = to_pci_dev(port->uport_dev);
> >> +
> >> /* uport may have more than 1 downstream EP. Check if already mapped. */
> >> if (port->uport_regs.ras)
> >> return;
> >> @@ -853,6 +883,9 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port)
> >> dev_err(&port->dev, "Failed to map RAS capability.\n");
> >> return;
> >> }
> >> +
> >> + cxl_assign_port_error_handlers(pdev);
> >> + devm_add_action_or_reset(port->uport_dev, cxl_clear_port_error_handlers, pdev);
> >> }
> >> EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, CXL);
> >>
> >> @@ -864,6 +897,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport)
> >> {
> >> struct device *dport_dev = dport->dport_dev;
> >> struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport_dev);
> >> + struct pci_dev *pdev = to_pci_dev(dport_dev);
> >>
> >> dport->reg_map.host = dport_dev;
> >> if (dport->rch && host_bridge->native_aer) {
> >> @@ -880,6 +914,12 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport)
> >> dev_err(dport_dev, "Failed to map RAS capability.\n");
> >> return;
> >> }
> >> +
> >> + if (dport->rch)
> >> + return;
> >> +
> >> + cxl_assign_port_error_handlers(pdev);
> >> + devm_add_action_or_reset(dport_dev, cxl_clear_port_error_handlers, pdev);
> >> }
> >> EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL);
> >>
>
Powered by blists - more mailing lists