lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48e24c23-67d4-4d09-a5f5-2a458a47e2e2@linux.intel.com>
Date: Mon, 4 Aug 2025 09:11:27 -0700
From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@...ux.intel.com>
To: Breno Leitao <leitao@...ian.org>
Cc: Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
 Oliver O'Halloran <oohall@...il.com>, Bjorn Helgaas <bhelgaas@...gle.com>,
 Jon Pan-Doh <pandoh@...gle.com>, linuxppc-dev@...ts.ozlabs.org,
 linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH] PCI/AER: Check for NULL aer_info before ratelimiting in
 pci_print_aer()


On 8/4/25 8:35 AM, Breno Leitao wrote:
> Hello Sathyanarayanan,
>
> On Mon, Aug 04, 2025 at 06:50:30AM -0700, Sathyanarayanan Kuppuswamy wrote:
>> On 8/4/25 2:17 AM, Breno Leitao wrote:
>>> Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called
>>> when dev->aer_info is NULL. Add a NULL check before proceeding to avoid
>>> calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which
>>> does not rate limit, given this is fatal.
>> Why not add it to pci_print_aer() ?
>>
>>> This prevents a kernel crash triggered by dereferencing a NULL pointer
>>> in aer_ratelimit(), ensuring safer handling of PCI devices that lack
>>> AER info. This change aligns pci_print_aer() with pci_dev_aer_stats_incr()
>>> which already performs this NULL check.
>> Is this happening during the kernel boot ? What is the frequency and steps
>> to reproduce? I am curious about why pci_print_aer() is called for a PCI device
>> without aer_info. Not aer_info means, that particular device is already released
>> or in the process of release (pci_release_dev()). Is this triggered by using a stale
>> pci_dev pointer?
> I've reported some of these investigations in here:
>
> https://lore.kernel.org/all/buduna6darbvwfg3aogl5kimyxkggu3n4romnmq6sozut6axeu@clnx7sfsy457/

It has some details. But you did not mention details like your environment, steps to
reproduce and how often you see it. I just want to understand in what scenario
pci_print_aer() is triggered, when releasing the device. I am wondering whether we
are missing proper locking some where.


-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ