[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250521193412.000067b3@huawei.com>
Date: Wed, 21 May 2025 19:34:12 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: "Bowman, Terry" <terry.bowman@....com>
CC: <linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-pci@...r.kernel.org>, <nifan.cxl@...il.com>, <dave@...olabs.net>,
<dave.jiang@...el.com>, <alison.schofield@...el.com>,
<vishal.l.verma@...el.com>, <dan.j.williams@...el.com>,
<bhelgaas@...gle.com>, <mahesh@...ux.ibm.com>, <ira.weiny@...el.com>,
<oohall@...il.com>, <Benjamin.Cheatham@....com>, <rrichter@....com>,
<nathan.fontenot@....com>, <Smita.KoralahalliChannabasappa@....com>,
<lukas@...ner.de>, <ming.li@...omail.com>,
<PradeepVineshReddy.Kodamati@....com>
Subject: Re: [PATCH v8 04/16] cxl/aer: AER service driver forwards CXL error
to CXL driver
On Tue, 20 May 2025 08:21:18 -0500
"Bowman, Terry" <terry.bowman@....com> wrote:
> On 5/20/2025 6:04 AM, Jonathan Cameron wrote:
> > On Thu, 15 May 2025 16:52:15 -0500
> > "Bowman, Terry" <terry.bowman@....com> wrote:
> >
> >> On 4/25/2025 8:18 AM, Jonathan Cameron wrote:
> >>> On Thu, 24 Apr 2025 09:17:45 -0500
> >>> "Bowman, Terry" <terry.bowman@....com> wrote:
> >>>
> >>>> On 4/23/2025 10:04 AM, Jonathan Cameron wrote:
> >>>>> On Wed, 26 Mar 2025 20:47:05 -0500
> >>>>> Terry Bowman <terry.bowman@....com> wrote:
> >>>>>
> >>>>>> The AER service driver includes a CXL-specific kfifo, intended to forward
> >>>>>> CXL errors to the CXL driver. However, the forwarding functionality is
> >>>>>> currently unimplemented. Update the AER driver to enable error forwarding
> >>>>>> to the CXL driver.
> >>>>>>
> >>>>>> Modify the AER service driver's handle_error_source(), which is called from
> >>>>>> process_aer_err_devices(), to distinguish between PCIe and CXL errors.
> >>>>>>
> >>>>>> Rename and update is_internal_error() to is_cxl_error(). Ensuring it
> >>>>>> checks both the 'struct aer_info::is_cxl' flag and the AER internal error
> >>>>>> masks.
> >>>>>>
> >>>>>> If the error is a standard PCIe error then continue calling pcie_aer_handle_error()
> >>>>>> as done in the current AER driver.
> >>>>>>
> >>>>>> If the error is a CXL-related error then forward it to the CXL driver for
> >>>>>> handling using the kfifo mechanism.
> >>>>>>
> >>>>>> Introduce a new function forward_cxl_error(), which constructs a CXL
> >>>>>> protocol error context using cxl_create_prot_err_info(). This context is
> >>>>>> then passed to the CXL driver via kfifo using a 'struct work_struct'.
> >>>>>>
> >>>>>> Signed-off-by: Terry Bowman <terry.bowman@....com>
> >>>>> Hi Terry,
> >>>>>
> >>>>> Finally got back to this. I'm not following how some of the reference
> >>>>> counting in here is working. It might be fine but there is a lot
> >>>>> taking then dropping device references - some of which are taken again later.
> >>>>>
> >>>>>> @@ -1082,10 +1094,44 @@ static void cxl_rch_enable_rcec(struct pci_dev *rcec)
> >>>>>> pci_info(rcec, "CXL: Internal errors unmasked");
> >>>>>> }
> >>>>>>
> >>>>>> +static void forward_cxl_error(struct pci_dev *_pdev, struct aer_err_info *info)
> >>>>>> +{
> >>>>>> + int severity = info->severity;
> >>>>> So far this variable isn't really justified. Maybe it makes sense later in the
> >>>>> series?
> >>>> This is used below in call to cxl_create_prot_err_info().
> >>> Sure, but why not just do
> >>>
> >>> if (cxl_create_prot_error_info(pdev, info->severity, &wd.err_info)) {
> >>>
> >>> There isn't anything modifying info->severity in between so that local
> >>> variable is just padding out the code to no real benefit.
> >>>
> >>>
> >>>>>> + pci_err(pdev, "Failed to create CXL protocol error information");
> >>>>>> + return;
> >>>>>> + }
> >>>>>> +
> >>>>>> + struct device *cxl_dev __free(put_device) = get_device(err_info->dev);
> >>>>> Also this one. A reference was acquired and dropped in cxl_create_prot_err_info()
> >>>>> followed by retaking it here. How do we know it is still about by this call
> >>>>> and once we pull it off the kfifo later?
> >>>> Yes, this is a problem I realized after sending the series.
> >>>>
> >>>> The device reference incr could be changed for all the devices to the non-cleanup
> >>>> variety. Then would add the reference incr in the caller after calling cxl_create_prot_err_info().
> >>>> I need to look at the other calls to to cxl_create_prot_err_info() as well.
> >>>>
> >>>> In addition, I think we should consider adding the CXL RAS status into the struct cxl_prot_err_info.
> >>>> This would eliminate the need for further accesses to the CXL device after being dequeued from the
> >>>> fifo. Thoughts?
> >>> That sounds like a reasonable solution to me.
> >>>
> >>> Jonathan
> >> Hi Jonathan,
> > Hi Terry,
> >
> > Sorry for delay - travel etc...
> >
> >> Is it sufficient to rely on correctly implemented reference counting implementation instead
> >> of caching the RAS status I mentioned earlier?
> >>
> >> I have the next revision coded to 'get' the CXL erring device's reference count in the AER
> >> driver before enqueuing in the kfifo and then added a reference count 'put' in the CXL driver
> >> after dequeuing and handling/logging. This is an alternative to what I mentioned earlier reading
> >> the RAS status and caching it. One more question: is it OK to implement the get and put (of
> >> the same object) in different drivers?
> > It's definitely unusual. If there is anything similar to point at I'd be happier than
> > this 'innovation' showing up here first.
> >
> >> If we need to read and cache the RAS status before the kfifo enqueue there will be some other
> >> details to work through.
> > This still smells like the cleaner solution to me, but depends on those details..
> >
> > Jonathan
>
> In this case I believe we will need to move the CE handling (RAS status reading and clearing) before
> the kfifo enqueue. I think this is necessary because CXL errors may continue to be received and we
> don't want their status's combined when reading or clearing. I can refactor cxl_handle_ras()/
> cxl_handle_cor_ras() to return the RAS status value and remove the trace logging (to instead be
> called after kfifo dequeue).
>
> This leaves the UCE case. It's worth mentioning the UCE flow is different than the the CE case
> because it uses the top-bottom traversal starting at the erring device. Correct me if I'm wrong
> this would be handled before the kfifo as well. The handling and logging in the UCE case are
> baked together. The UCE flow would therefore need to include the trace logging during handling.
>
> Another flow is the PCI EP errors. The PCIe EP CE and UCE handlers remain and can call the
> the refactored cxl_handle_ras()/cxl_handle_cor_ras() and then trace log afterwards. This is no
> issue.
>
> This leaves only CE trace logging to be called after the kfifo dequeue. This is what doesn't
> feel right and wanted to draw attention to.
>
> All this to say: very little work will be done after the kfifo dequeue. Most of the work in
> the kfifo implementation would be before the kfifo enqueuing in the CXL create_prot_error_info()
> callback. I am concerned the balance of work done before and after the kfifo enqueue/dequeue
> will be very asymmetric with little value provided from the kfifo.
>
As per the discord chat - if you look up the device again from BDF or similar and get this
info once you have right locks post kfifo all should be fine as any race will be easy
to resolve by doing nothing if the driver has gone away.
Jonathan
> -Terry
>
Powered by blists - more mailing lists