[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <001c01dc9740$c7722540$56566fc0$@trustnetic.com>
Date: Fri, 6 Feb 2026 16:15:39 +0800
From: Jiawen Wu <jiawenwu@...stnetic.com>
To: "'Bjorn Helgaas'" <helgaas@...nel.org>
Cc: "'Rafael J. Wysocki'" <rafael@...nel.org>,
"'Tony Luck'" <tony.luck@...el.com>,
"'Borislav Petkov'" <bp@...en8.de>,
"'Hanjun Guo'" <guohanjun@...wei.com>,
"'Mauro Carvalho Chehab'" <mchehab@...nel.org>,
"'Shuai Xue'" <xueshuai@...ux.alibaba.com>,
"'Len Brown'" <lenb@...nel.org>,
"'Shiju Jose'" <shiju.jose@...wei.com>,
"'Bjorn Helgaas'" <bhelgaas@...gle.com>,
<linux-acpi@...r.kernel.org>,
<linux-kernel@...r.kernel.org>,
"'Rafael J. Wysocki'" <rafael@...nel.org>,
"'Tony Luck'" <tony.luck@...el.com>,
"'Borislav Petkov'" <bp@...en8.de>,
"'Hanjun Guo'" <guohanjun@...wei.com>,
"'Mauro Carvalho Chehab'" <mchehab@...nel.org>,
"'Shuai Xue'" <xueshuai@...ux.alibaba.com>,
"'Len Brown'" <lenb@...nel.org>,
"'Shiju Jose'" <shiju.jose@...wei.com>,
"'Bjorn Helgaas'" <bhelgaas@...gle.com>,
<linux-acpi@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] ACPI: APEI: Avoid NULL pointer dereference in ghes_estatus_pool_region_free
On Thu, Feb 5, 2026 11:39 PM, Bjorn Helgaas wrote:
> On Thu, Feb 05, 2026 at 11:11:02AM +0800, Jiawen Wu wrote:
> > On Thu, Feb 5, 2026 5:46 AM, Bjorn Helgaas wrote:
> > > On Wed, Feb 04, 2026 at 10:03:34AM +0800, Jiawen Wu wrote:
> > > > On Wed, Feb 4, 2026 6:55 AM, Bjorn Helgaas wrote:
> > > > > On Tue, Feb 03, 2026 at 10:12:32AM +0800, Jiawen Wu wrote:
> > > > > > The function ghes_estatus_pool_region_free() is exported and
> > > > > > be called by the PCIe AER recovery path, which
> > > > > > unconditionally invokes it to free aer_capability_regs
> > > > > > memory.
> > > > > >
> > > > > > Although current AER usage assumes memory comes from the
> > > > > > GHES pool, robustness requires guarding against pool
> > > > > > unavailability. Add a NULL check before calling
> > > > > > gen_pool_free() to prevent crashes when the pool is not
> > > > > > initialized. This also makes the API safer for potential
> > > > > > future use by non-GHES callers.
> > > > >
> > > > > I'm not sure what you mean by "pool unavailability." I think
> > > > > getting here with ghes_estatus_pool==NULL means we have a
> > > > > logic error somewhere, and I don't think we should silently
> > > > > hide that error.
> > > > >
> > > > > I'm generally in favor of *not* checking so we find out if the
> > > > > caller forgot to keep track of the pointer correctly.
> > > >
> > > > "pool unavailability" means that when I attempt to call
> > > > aer_recover_queue() in a ethernet driver, which does not create
> > > > ghes_estatus_pool, it leads to a NULL pointer dereference.
> > >
> > > I guess that means you contemplate having an ethernet driver
> > > allocate and manage its own struct aer_capability_regs to pass to
> > > aer_recover_queue(). But I don't understand why such a driver
> > > would be involved in this part of the AER processing.
> > >
> > > Normally a device like a NIC that detects an error logs something
> > > in its local AER Capability, then sends an ERR_* message upstream.
> > > The Root Port that receives that ERR_* message generates an
> > > interrupt. In the native AER case, the Linux AER driver handles
> > > that interrupt, reads the error logs from the AER Capability of
> > > the device that sent the ERR_* message, and logs it. In the
> > > firmware-first case used by GHES, platform firmware handles the
> > > interrupt, reads the error logs, packages them up, and sends them
> > > to the Linux AER driver via GHES and aer_recover_queue().
> > >
> > > What's the PCIe hardware flow that would lead to an ethernet
> > > driver calling aer_recover_queue()? An Endpoint driver wouldn't
> > > receive the AER interrupt generated by the Root Port.
> > >
> > > I suppose a NIC could generate its own device-specific interrupt
> > > when it logs an error in its local AER Capability, but if it
> > > conforms to the PCIe spec, it should also send an ERR_* message,
> > > which would feed into the existing AER path. I don't think we'd
> > > want the existing AER path racing with a parallel AER path in the
> > > Endpoint driver.
> >
> > Thank you for your detailed explanation.
> >
> > I fully agree that aer_recover_queue() is intended for
> > firmware-first error reporting via GHES, and an endpoint driver
> > should not normally invoke it directly.
> >
> > However, in practice, we've encountered platforms where AER
> > interrupts are not delivered reliably. For example, due to BIOS
> > misconfiguration, disabled AER in firmware, or hardware that fails
> > to generate ERR_* messages correctly. On such systems, when a PCIe
> > error occurs, the standard AER path is never triggered, and the
> > device remains in a stuck state.
> >
> > To verify this, I simulated a PCIE error by injecting it into the
> > NIC register. But the Linux AER driver didn't respond at all, on
> > many platforms.
> >
> > As a device driver, we'd like to ensure best-effort recovery
> > regardless of platform AER support. Since pcie_do_recovery()
> > encapsulates the complete and correct recovery sequence, it's
> > exactly what we need-but it's not exported.
> >
> > Given this, could you advise on the proper way for an endpoint
> > driver to initiate full PCIe error recovery when AER is unavailable?
> > Is there a recommended pattern that safely achieves the same effect
> > as pcie_do_recovery() without duplicating its logic?
>
> It makes sense to try to work around broken hardware, and I think we
> should try to identify exactly what is broken and address it directly.
>
> If the NIC itself is broken, the problem should happen on every
> platform, and a quirk or the driver might be the best place to deal
> with it.
>
> If the platform is broken, we should see problems with many devices,
> and it would be better to deal with it more centrally instead of a
> single endpoint driver.
Thank you for the thoughtful response.
We are the NIC vendor, and our hardware (like many high-speed PCIe devices)
can occasionally encounter PCIe errors due to real-world factors such as signal
integrity issues, or marginal link training. These are not necessarily design
flaws in the NIC itself, but rather transient conditions that can occur in
field deployments.
While we agree that platforms should properly deliver AER interrupts, in
practice we see many customer environments (especially in embedded or custom
server platforms) where:
* AER is disabled in BIOS
* The root port does not generate the architected interrupt
* Firmware simply fails to report the error via GHES
As a driver vendor, we have no ability to fix or even influence these
platform-level issues. Yet from the user's perspective, the result is the same:
the NIC becomes unusable (config space reads return 0xFFFFFFFF), and the
network interface hangs indefinitely.
Our goal is not to bypass the AER architecture, but to provide a last-resort
recovery mechanism when the standard path is broken through no fault of our
own. Since pcie_do_recovery() already implements the correct sequence, it would
be ideal if endpoint drivers could safely invoke a similar flow when they
detect a local failure (e.g., via MMIO timeout or Tx stall). I understand the
concern about layering, but without any way to trigger recovery, the device
remains dead. I think the driver only can do is copy the code of
pcie_do_recovery() to restore the device. Would it be reasonable to consider
exporting a recovery helper for use by endpoint drivers?
> I know about several platforms that don't support the architected AER
> interrupt, e.g.,
> https://lore.kernel.org/all/20250702223841.GA1905230@bhelgaas/t/#u
> There is some work in progress to address this particular problem.
>
> Do you have any specifics about the devices and platforms where you're
> seeing issues?
The test platform I'm currently using:
* CPU: AMD Ryzen 9 7950X 16-Core Processor
* BIOS version: E7E16AMS.190
* OS: Ubuntu 25.04
* Kernel: Linux 6.19.0-rc7+
The device is our NIC, the driver is in the directory:
drivers/net/ethernet/wangxun/
If you need more detailed information, please let me know.
Thanks again for your time and support.
Powered by blists - more mailing lists