lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5cfffd00-b62e-4004-a5a6-58134a9a1c80@linux.ibm.com>
Date: Wed, 7 Jan 2026 11:52:59 +0530
From: Narayana Murty N <nnmlinux@...ux.ibm.com>
To: Timothy Pearson <tpearson@...torengineering.com>
Cc: mahesh <mahesh@...ux.ibm.com>, Oliver <oohall@...il.com>,
        Madhavan Srinivasan <maddy@...ux.ibm.com>,
        Michael Ellerman <mpe@...erman.id.au>, npiggin <npiggin@...il.com>,
        christophe leroy <christophe.leroy@...roup.eu>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        vaibhav
 <vaibhav@...ux.ibm.com>,
        Shivaprasad G Bhat <sbhat@...ux.ibm.com>, ganeshgr@...ux.ibm.com
Subject: Re: [PATCH v2 1/1] powerpc/eeh: fix recursive pci_lock_rescan_remove
 locking in EEH event handling


On 11/12/25 9:15 PM, Timothy Pearson wrote:
>
> ----- Original Message -----
>> From: "Narayana Murty N" <nnmlinux@...ux.ibm.com>
>> To: "mahesh" <mahesh@...ux.ibm.com>, "Oliver" <oohall@...il.com>, "Madhavan Srinivasan" <maddy@...ux.ibm.com>, "Michael
>> Ellerman" <mpe@...erman.id.au>, "npiggin" <npiggin@...il.com>, "christophe leroy" <christophe.leroy@...roup.eu>
>> Cc: "Bjorn Helgaas" <bhelgaas@...gle.com>, "Timothy Pearson" <tpearson@...torengineering.com>, "linuxppc-dev"
>> <linuxppc-dev@...ts.ozlabs.org>, "linux-kernel" <linux-kernel@...r.kernel.org>, "vaibhav" <vaibhav@...ux.ibm.com>,
>> "Shivaprasad G Bhat" <sbhat@...ux.ibm.com>, ganeshgr@...ux.ibm.com
>> Sent: Wednesday, December 10, 2025 8:25:59 AM
>> Subject: [PATCH v2 1/1] powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling
>> The recent commit 1010b4c012b0 ("powerpc/eeh: Make EEH driver device
>> hotplug safe") restructured the EEH driver to improve synchronization
>> with the PCI hotplug layer.
>>
>> However, it inadvertently moved pci_lock_rescan_remove() outside its
>> intended scope in eeh_handle_normal_event(), leading to broken PCI
>> error reporting and improper EEH event triggering. Specifically,
>> eeh_handle_normal_event() acquired pci_lock_rescan_remove() before
>> calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to
>> acquire the same lock internally, causing nested locking and disrupting
>> normal EEH event handling paths.
>>
>> This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(),
>> with two public wrappers:
>>     eeh_pe_bus_get() with locking enabled.
>>     eeh_pe_bus_get_nolock() that skips locking.
>>
>> Callers that already hold pci_lock_rescan_remove() now use
>> eeh_pe_bus_get_nolock() to avoid recursive lock acquisition.
>>
>> Additionally, pci_lock_rescan_remove() calls are restored to the correct
>> position—after eeh_pe_bus_get() and immediately before iterating affected
>> PEs and devices. This ensures EEH-triggered PCI removes occur under proper
>> bus rescan locking without recursive lock contention.
>>
>> The eeh_pe_loc_get() function has been split into two functions:
>>     eeh_pe_loc_get(struct eeh_pe *pe) which retrieves the loc for given PE.
>>     eeh_pe_loc_get_bus(struct pci_bus *bus) which retrieves the location
>>     code for given bus.
> Conceptually the patch sounds OK, but given the complexity of these subsystems it's difficult to forsee all interactions.  Was the patch verified not to break NVMe hotplug on PowerNV systems using actual hardware?  If not, I will need to do so before sending an ack.  Thanks!
Hi Timothy,

Thanks for your suggestion,I have now verified the change on a PowerNV 
system with NVMe hotplug.

Test setup:
Platform: PowerNV (“Hardware name: 9105-22A POWER10 (raw) 0x800200 
opal:v7.1-126-g9f16f2d9e PowerNV”).
Kernel: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
this patch on top of commit  d358e5254674
Devices: two PCIe NVMe drives in hotpluggable slots.

Tests performed:
Basic hotplug:
Repeated NVMe add/remove cycles using the platform’s hotplug controls 
(slot power off/on or PCIe attention button, as applicable).
Confirmed that each add/remove cycle correctly created and removed 
/dev/nvme* nodes, and that nvme list/I/O (e.g. fio or dd) worked before 
removal and failed cleanly after removal.
Confirmed there were no lockdep splats, warnings, or stack traces 
related to pci_lock_rescan_remove() or EEH during these tests.

Regression checks:
With these tests, NVMe hotplug and EEH behaviour on PowerNV appears 
unchanged except for the intended fix (no recursive 
pci_lock_rescan_remove() acquisition and normal EEH event handling).

Thanks,

Narayana


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ