[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <697254e4efe61_3095100a3@dwillia2-mobl4.notmuch>
Date: Thu, 22 Jan 2026 08:48:36 -0800
From: <dan.j.williams@...el.com>
To: Lukas Wunner <lukas@...ner.de>, <dan.j.williams@...el.com>
CC: Terry Bowman <terry.bowman@....com>, <dave@...olabs.net>,
<jonathan.cameron@...wei.com>, <dave.jiang@...el.com>,
<alison.schofield@...el.com>, <bhelgaas@...gle.com>, <shiju.jose@...wei.com>,
<ming.li@...omail.com>, <Smita.KoralahalliChannabasappa@....com>,
<rrichter@....com>, <dan.carpenter@...aro.org>,
<PradeepVineshReddy.Kodamati@....com>, <Benjamin.Cheatham@....com>,
<sathyanarayanan.kuppuswamy@...ux.intel.com>, <linux-cxl@...r.kernel.org>,
<vishal.l.verma@...el.com>, <alucerop@....com>, <ira.weiny@...el.com>,
<linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>
Subject: Re: [PATCH v14 09/34] PCI/AER: Export
pci_aer_unmask_internal_errors()
Lukas Wunner wrote:
> On Mon, Jan 19, 2026 at 06:09:39PM -0800, dan.j.williams@...el.com wrote:
> > Terry Bowman wrote:
> > > Internal PCIe errors are not enabled by default during initialization. This
> > > creates a problem for CXL drivers, which rely on PCIe Correctable and
> > > Uncorrectable Internal Errors to receive CXL protocol error notifications.
> > >
> > > Export pci_aer_unmask_internal_errors() so CXL and other drivers can
> > > enable internal PCIe errors.
> >
> > I folded in the following to this patch because opening up internal
> > errors for PCIe drivers in general is not a goal.
>
> As said, the "xe" driver needs to unmask Internal Errors and could
> take advantage of this helper, so I'd call opening this up for PCI
> drivers if not a goal then at least a "desirable side effect". ;)
>
> https://lore.kernel.org/all/aR1_M_i3yIygd8v-@wunner.de/
I missed that earlier. How did xe manage to be the only device in the history
of Linux that needs internal errors unmasked?
What happens if Linux says "no, that error model has never been supported and
it creates in ongoing mental / maintenance load of internal errors do not
matter for PCIe, only CXL, (except xe)."
> > + Internal PCIe errors are not enabled by default during initialization
> > + because their behavior is too device-specific and there is no standard way
> > + to reason about them.
>
> Well, they're not enabled by default because per the spec they're
> masked in the Uncorrectable Error Mask and Correctable Error Mask
> Registers. It's up to drivers to unmask them if they know the
> hardware signals them. CXL just happens be one of those drivers.
>
> > @@ drivers/pci/pcie/aer.c: static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> > - mask &= ~PCI_ERR_COR_INTERNAL;
> > pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
> > }
>
> Unexplained change vis-à-vis Terry's submission. It seems you're
> reading the Correctable Error Mask Register and writing the same
> value back. That's doesn't seem to make sense.
No, sorry, this an interdiff so that change was just a change in context.
It also caused me to do a double-take until I realized it was a pure hunk
context change.
>
> > -+EXPORT_SYMBOL_GPL(pci_aer_unmask_internal_errors);
> >
> > ++/*
> > ++ * Internal errors are too device-specific to enable generally, however for CXL
> > ++ * their behavior is standardized for conveying CXL protocol errors.
> > ++ */
> > ++EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core");
> > ++
>
> This change will require touching aer.c every time a driver
> (such as xe) has the need to unmask Internal Errors.
> Not sure if that's such a good idea...
The xe driver can always come back and change this to plain EXPORT_SYMBOL_GPL()
once the clear the hurdle above of, "please reconsider your error model to not
require this never needed before feature of AER".
Powered by blists - more mailing lists