[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <212e35e0-3c71-443b-9f4a-8720ef3d0ba0@amd.com>
Date: Wed, 10 Dec 2025 15:57:57 -0600
From: "Bowman, Terry" <terry.bowman@....com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: dave@...olabs.net, jonathan.cameron@...wei.com, dave.jiang@...el.com,
alison.schofield@...el.com, dan.j.williams@...el.com, bhelgaas@...gle.com,
shiju.jose@...wei.com, ming.li@...omail.com,
Smita.KoralahalliChannabasappa@....com, rrichter@....com,
dan.carpenter@...aro.org, PradeepVineshReddy.Kodamati@....com,
lukas@...ner.de, Benjamin.Cheatham@....com,
sathyanarayanan.kuppuswamy@...ux.intel.com, linux-cxl@...r.kernel.org,
alucerop@....com, ira.weiny@...el.com, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org
Subject: Re: [PATCH v13 08/25] CXL/AER: Move AER drivers RCH error handling
into pcie/aer_cxl_rch.c
On 12/8/2025 3:28 PM, Bowman, Terry wrote:
> On 12/8/2025 12:06 PM, Bjorn Helgaas wrote:
>> I vote for a subject like:
>>
>> PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c
>>
>> I think stuff in drivers/pci should have a PCI/... prefix. "CXL" is
>> really its own major subsystem, not a feature of PCI.
>>
>> On Mon, Nov 03, 2025 at 06:09:44PM -0600, Terry Bowman wrote:
>>> The restricted CXL Host (RCH) AER error handling logic currently resides
>>> in the AER driver file, drivers/pci/pcie/aer.c. CXL specific changes are
>>> conditionally compiled using #ifdefs.
>>
>> s|the AER driver file, drivers/pci/pcie/aer.c|aer.c|
>>
>>> Improve the AER driver maintainability by separating the RCH specific logic
>>> from the AER driver's core functionality and removing the ifdefs. Introduce
>>> drivers/pci/pcie/aer_cxl_rch.c for moving the RCH AER logic into.
>>> Conditionally compile the file using the CONFIG_CXL_RCH_RAS Kconfig.
>>>
>>> Move the CXL logic into the new file but leave helper functions in aer.c
>>> for now as they will be moved in future patch for CXL virtual hierarchy
>>> handling. Export the handler functions as needed. Export
>>> pci_aer_unmask_internal_errors() allowing for all subsystems to use.
>>> Avoid multiple declaration moves and export cxl_error_is_native() now to
>>> allow access from cxl_core.
>>>
>>> Inorder to maintain compilation after the move other changes are required.
>>> Change cxl_rch_handle_error() & cxl_rch_enable_rcec() to be non-static
>>> inorder for accessing from the AER driver in aer.c.
>>
>> s/Inorder to/In order to/ (or just "To maintain ...")
>> /inorder for accessing from the AER driver in/so they can be used by/
>>
>>> Update the new file with the SPDX and 2023 AMD copyright notations because
>>> the RCH bits were initally contributed in 2023 by AMD.
>>
>> Maybe cite the commit that did this so it's easy to check.
>>
>
> Ok
>
>>> +++ b/drivers/pci/pci.h
>>
>>> +#ifdef CONFIG_CXL_RAS
>>> +bool is_internal_error(struct aer_err_info *info);
>>> +#else
>>> +static inline bool is_internal_error(struct aer_err_info *info) { return false; }
>>
>> This used to be static and internal. "is_internal_error()" seems a
>> little too generic now that it's not static; probably should include
>> "aer". Maybe rename it in a preliminary patch so the move is more of
>> a pure move.
>>
>>> +++ b/drivers/pci/pcie/aer.c
>>> @@ -1130,7 +1130,7 @@ static bool find_source_device(struct pci_dev *parent,
>>> * Note: AER must be enabled and supported by the device which must be
>>> * checked in advance, e.g. with pcie_aer_is_native().
>>> */
>>> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
>>> +void pci_aer_unmask_internal_errors(struct pci_dev *dev)
>>> {
>>> int aer = dev->aer_cap;
>>> u32 mask;
>>> @@ -1143,116 +1143,25 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
>>> mask &= ~PCI_ERR_COR_INTERNAL;
>>> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
>>> }
>>> +EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
>>
>> Not sure why these EXPORTs are needed. Is there a caller that can be
>> a module? The callers I see look like they would be builtin. If you
>> add callers later that need this, the export can be done then.
>>
>
> pci_aer_unmask_internal_errors() is called by the cxl_core module later in
> the 2nd to-last patch. I'll move the export change to the later patch. At
> one point I was trying to avoid changes to same definitions multiple times.
>
>>> +++ b/include/linux/aer.h
>>> @@ -56,12 +56,20 @@ struct aer_capability_regs {
>>> #if defined(CONFIG_PCIEAER)
>>> int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>>> int pcie_aer_is_native(struct pci_dev *dev);
>>> +void pci_aer_unmask_internal_errors(struct pci_dev *dev);
>>> #else
>>> static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>> {
>>> return -EINVAL;
>>> }
>>> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>>> +static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
>>> +#endif
>>> +
>>> +#ifdef CONFIG_CXL_RAS
>>> +bool cxl_error_is_native(struct pci_dev *dev);
>>> +#else
>>> +static inline bool cxl_error_is_native(struct pci_dev *dev) { return false; }
>>
>> These include/linux/aer.h changes look like a separate patch. Moving
>> code from aer.c to aer_cxl_rch.c doesn't add any callers outside
>> drivers/pci, so these shouldn't need to be in include/linux/.
>
> I'll remove these from here.
>
> - Terry
Hi Bjorn,
I reviewed this more closely and recalled the reasoning behind the change.
Lukas requested that pci_aer_unmask_internal_errors() be made available
across the entire kernel. I already noted this in the commit message, but
I can also include a link to Lukas’s request. Alternatively, I could split
this into a separate patch with a Recommended-by tag, leave it as is, or
make another adjustment. Additionally, I’ll update cxl_error_is_native()
so it’s only included when necessary.
Terry
Powered by blists - more mailing lists