[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cb23df9f-d7a0-4cc1-93e2-acd9b7845b43@amd.com>
Date: Tue, 16 Sep 2025 10:18:11 -0500
From: "Bowman, Terry" <terry.bowman@....com>
To: Lukas Wunner <lukas@...ner.de>, Dave Jiang <dave.jiang@...el.com>
Cc: dave@...olabs.net, jonathan.cameron@...wei.com,
alison.schofield@...el.com, dan.j.williams@...el.com, bhelgaas@...gle.com,
shiju.jose@...wei.com, ming.li@...omail.com,
Smita.KoralahalliChannabasappa@....com, rrichter@....com,
dan.carpenter@...aro.org, PradeepVineshReddy.Kodamati@....com,
Benjamin.Cheatham@....com, sathyanarayanan.kuppuswamy@...ux.intel.com,
linux-cxl@...r.kernel.org, alucerop@....com, ira.weiny@...el.com,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: [PATCH v11 18/23] PCI/AER: Dequeue forwarded CXL error
On 8/29/2025 2:10 AM, Lukas Wunner wrote:
> On Thu, Aug 28, 2025 at 05:43:31PM -0700, Dave Jiang wrote:
>> On 8/26/25 6:35 PM, Terry Bowman wrote:
>>> +static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
>>> +{
>>> + struct pci_dev *pdev = err_info->pdev;
>>> + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>> So this function is called from the workqueue thread to consume data
>> from the kfifo right? Do we need to take the device lock of the pdev
>> to ensure that a driver is bound to the device before we attempt to
>> retrieve the data? And do we also need to verify that the driver bound
>> is the cxl_pci driver (and not something like vfio_pci)? Otherwise I
>> think assuming the drv data is cxl_dev_state may cause crash.
> In v10 of this series, there used to be a cxl_pci_drv_bound() function
> to verify that the cxl_pci_driver is bound and not some other driver.
> That function was called from cxl_rch_handle_error_iter().
>
> It seems this is gone in v11?
>
> Thanks,
>
> Lukas
Hi Lukas,
Yes, this was removed due to a build issue. I am adding back with the fix.
You mentioned cxl_rch_handle_error_iter() above and I want to clarify my
understanding that this is not needed in the RCH case. RCH handling includes
traversal to find the first downstream PCI EP and calls the EP's
pci_driver::err_handlers callbacks. These are part of pci_driver and therefore
don't need the driver check as they will work for any bound pci_driver.
The check is needed for VH EP CE and non-fatal UCE because they are handled
by CXL error callbacks defined in the cxl_core/cxl_pci modules and not in the
pci_driver::err_handlers.
Terry
Powered by blists - more mailing lists