[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9cd9f4cf-72ab-40f1-9ead-3e6807b4d474@linux.intel.com>
Date: Mon, 4 Aug 2025 06:50:30 -0700
From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@...ux.intel.com>
To: Breno Leitao <leitao@...ian.org>,
Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
Oliver O'Halloran <oohall@...il.com>, Bjorn Helgaas <bhelgaas@...gle.com>,
Jon Pan-Doh <pandoh@...gle.com>
Cc: linuxppc-dev@...ts.ozlabs.org, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH] PCI/AER: Check for NULL aer_info before ratelimiting in
pci_print_aer()
On 8/4/25 2:17 AM, Breno Leitao wrote:
> Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
> when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
> calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
> does not rate limit, given this is fatal.
Why not add it to pci_print_aer() ?
>
> This prevents a kernel crash triggered by dereferencing a NULL pointer
> in aer_ratelimit(), ensuring safer handling of PCI devices that lack
> AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
> which already performs this NULL check.
Is this happening during the kernel boot ? What is the frequency and steps
to reproduce? I am curious about why pci_print_aer() is called for a PCI device
without aer_info. Not aer_info means, that particular device is already released
or in the process of release (pci_release_dev()). Is this triggered by using a stale
pci_dev pointer?
>
> Signed-off-by: Breno Leitao <leitao@...ian.org>
> Fixes: a57f2bfb4a5863 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
> ---
> drivers/pci/pcie/aer.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 70ac661883672..b5f96fde4dcda 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -786,6 +786,9 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
>
> static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
> {
> + if (!dev->aer_info)
> + return 1;
> +
> switch (severity) {
> case AER_NONFATAL:
> return __ratelimit(&dev->aer_info->nonfatal_ratelimit);
>
> ---
> base-commit: 89748acdf226fd1a8775ff6fa2703f8412b286c8
> change-id: 20250801-aer_crash_2-b21cc2ef0d00
>
> Best regards,
> --
> Breno Leitao <leitao@...ian.org>
>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
Powered by blists - more mailing lists