lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b3a65424-e5f6-4407-8b11-cf56047ab071@linux.ibm.com>
Date: Thu, 20 Nov 2025 15:18:49 +0530
From: Narayana Murty N <nnmlinux@...ux.ibm.com>
To: mahesh@...ux.ibm.com
Cc: oohall@...il.com, maddy@...ux.ibm.com, mpe@...erman.id.au,
        npiggin@...il.com, christophe.leroy@...roup.eu, bhelgaas@...gle.com,
        tpearson@...torengineering.com, linuxppc-dev@...ts.ozlabs.org,
        linux-kernel@...r.kernel.org, vaibhav@...ux.ibm.com,
        sbhat@...ux.ibm.com, ganeshgr@...ux.ibm.com
Subject: Re: [PATCH 1/1] powerpc/eeh: fix recursive pci_lock_rescan_remove
 locking in EEH event handling


On 18/11/25 2:26 PM, Mahesh J Salgaonkar wrote:
> On 2025-11-05 00:40:52 Wed, Narayana Murty N wrote:
>> The recent commit 1010b4c012b0 ("powerpc/eeh: Make EEH driver device
>> hotplug safe") restructured the EEH driver to improve synchronization
>> with the PCI hotplug layer.
>>
>> However, it inadvertently moved pci_lock_rescan_remove() outside its
>> intended scope in eeh_handle_normal_event(), leading to broken PCI
>> error reporting and improper EEH event triggering. Specifically,
>> eeh_handle_normal_event() acquired pci_lock_rescan_remove() before
>> calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to
>> acquire the same lock internally, causing nested locking and disrupting
>> normal EEH event handling paths.
>>
>> This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(),
>> with two public wrappers:
>>      eeh_pe_bus_get() with locking enabled.
>>      eeh_pe_bus_get_nolock() that skips locking.
>>
>> Callers that already hold pci_lock_rescan_remove() now use
>> eeh_pe_bus_get_nolock() to avoid recursive lock acquisition.
>>
>> Additionally, pci_lock_rescan_remove() calls are restored to the correct
>> position—after eeh_pe_bus_get() and immediately before iterating affected
>> PEs and devices. This ensures EEH-triggered PCI removes occur under proper
>> bus rescan locking without recursive lock contention.
>>
> [...]
>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>> index ef78ff77cf8f..492fae5e3823 100644
>> --- a/arch/powerpc/kernel/eeh_driver.c
>> +++ b/arch/powerpc/kernel/eeh_driver.c
>> @@ -812,6 +812,35 @@ static void eeh_clear_slot_attention(struct pci_dev *pdev)
>>   	ops->set_attention_status(slot->hotplug, 0);
>>   }
>>   
>> +static const char *eeh_pe_get_loc(struct eeh_pe *pe)
>> +{
> So it is duplicate of eeh_pe_loc_get() with nolock variant. Instead, can
> we split original function eeh_pe_loc_get() ? Move the while (bus) loop
> into another function with name eeh_bus_loc_get(bus) which will be
> nolock variant and use that here ?

Thanks Mahesh, your suggestion will be taken care in the next version of 
patch.
https://lore.kernel.org/all/20251120054418.3363-1-nnmlinux@linux.ibm.com/

>> +	struct pci_bus *bus = eeh_pe_bus_get_nolock(pe);
>> +	struct device_node *dn;
>> +	const char *location = NULL;
>> +
>> +	while (bus) {
>> +		dn = pci_bus_to_OF_node(bus);
>> +		if (!dn) {
>> +			bus = bus->parent;
>> +			continue;
>> +		}
>> +
>> +		if (pci_is_root_bus(bus))
>> +			location = of_get_property(dn, "ibm,io-base-loc-code",
>> +						   NULL);
>> +		else
>> +			location = of_get_property(dn, "ibm,slot-location-code",
>> +						   NULL);
>> +
>> +		if (location)
>> +			return location;
>> +
>> +		bus = bus->parent;
>> +	}
>> +
>> +	return "N/A";
>> +}
>> +
>>   /**
>>    * eeh_handle_normal_event - Handle EEH events on a specific PE
>>    * @pe: EEH PE - which should not be used after we return, as it may
>> @@ -846,7 +875,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
>>   
>>   	pci_lock_rescan_remove();
>>   
>> -	bus = eeh_pe_bus_get(pe);
>> +	bus = eeh_pe_bus_get_nolock(pe);
>>   	if (!bus) {
>>   		pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n",
>>   			__func__, pe->phb->global_number, pe->addr);
>> @@ -886,14 +915,14 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
>>   	/* Log the event */
>>   	if (pe->type & EEH_PE_PHB) {
>>   		pr_err("EEH: Recovering PHB#%x, location: %s\n",
>> -			pe->phb->global_number, eeh_pe_loc_get(pe));
>> +			pe->phb->global_number, eeh_pe_get_loc(pe));
>>   	} else {
>>   		struct eeh_pe *phb_pe = eeh_phb_pe_get(pe->phb);
>>   
>>   		pr_err("EEH: Recovering PHB#%x-PE#%x\n",
>>   		       pe->phb->global_number, pe->addr);
>>   		pr_err("EEH: PE location: %s, PHB location: %s\n",
>> -		       eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe));
>> +		       eeh_pe_get_loc(pe), eeh_pe_get_loc(phb_pe));
>>   	}
>>   
> Thanks,
> -Mahesh.

Regards,

-Narayana Murty.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ