linux-kernel - Re: [PATCH] ACPI: APEI: Avoid NULL pointer dereference in ghes_estatus_pool_region

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20260204214604.GA17868@bhelgaas>
Date: Wed, 4 Feb 2026 15:46:04 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: Jiawen Wu <jiawenwu@...stnetic.com>
Cc: "'Rafael J. Wysocki'" <rafael@...nel.org>,
	'Tony Luck' <tony.luck@...el.com>, 'Borislav Petkov' <bp@...en8.de>,
	'Hanjun Guo' <guohanjun@...wei.com>,
	'Mauro Carvalho Chehab' <mchehab@...nel.org>,
	'Shuai Xue' <xueshuai@...ux.alibaba.com>,
	'Len Brown' <lenb@...nel.org>, 'Shiju Jose' <shiju.jose@...wei.com>,
	'Bjorn Helgaas' <bhelgaas@...gle.com>, linux-acpi@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ACPI: APEI: Avoid NULL pointer dereference in
 ghes_estatus_pool_region_free

On Wed, Feb 04, 2026 at 10:03:34AM +0800, Jiawen Wu wrote:
> On Wed, Feb 4, 2026 6:55 AM, Bjorn Helgaas wrote:
> > On Tue, Feb 03, 2026 at 10:12:32AM +0800, Jiawen Wu wrote:
> > > The function ghes_estatus_pool_region_free() is exported and be called
> > > by the PCIe AER recovery path, which unconditionally invokes it to free
> > > aer_capability_regs memory.
> > >
> > > Although current AER usage assumes memory comes from the GHES pool,
> > > robustness requires guarding against pool unavailability. Add a NULL check
> > > before calling gen_pool_free() to prevent crashes when the pool is not
> > > initialized. This also makes the API safer for potential future use by
> > > non-GHES callers.
> > 
> > I'm not sure what you mean by "pool unavailability."  I think getting
> > here with ghes_estatus_pool==NULL means we have a logic error
> > somewhere, and I don't think we should silently hide that error.
> > 
> > I'm generally in favor of *not* checking so we find out if the caller
> > forgot to keep track of the pointer correctly.
> 
> "pool unavailability" means that when I attempt to call
> aer_recover_queue() in a ethernet driver, which does not create
> ghes_estatus_pool, it leads to a NULL pointer dereference. 

I guess that means you contemplate having an ethernet driver allocate
and manage its own struct aer_capability_regs to pass to
aer_recover_queue().  But I don't understand why such a driver would
be involved in this part of the AER processing.

Normally a device like a NIC that detects an error logs something in
its local AER Capability, then sends an ERR_* message upstream.  The
Root Port that receives that ERR_* message generates an interrupt.  In
the native AER case, the Linux AER driver handles that interrupt,
reads the error logs from the AER Capability of the device that sent
the ERR_* message, and logs it.  In the firmware-first case used by
GHES, platform firmware handles the interrupt, reads the error logs,
packages them up, and sends them to the Linux AER driver via GHES and
aer_recover_queue().

What's the PCIe hardware flow that would lead to an ethernet driver
calling aer_recover_queue()?  An Endpoint driver wouldn't receive the
AER interrupt generated by the Root Port.

I suppose a NIC could generate its own device-specific interrupt when
it logs an error in its local AER Capability, but if it conforms to
the PCIe spec, it should also send an ERR_* message, which would feed
into the existing AER path.  I don't think we'd want the existing AER
path racing with a parallel AER path in the Endpoint driver.

Bjorn