lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dqoufbaj47rd2cn5x6fnwho4hwyq557g2pzak35pf5ipi77lfz@fh2xbozyr3rn>
Date: Tue, 25 Nov 2025 10:16:01 +0530
From: Mahesh J Salgaonkar <mahesh@...ux.ibm.com>
To: Narayana Murty N <nnmlinux@...ux.ibm.com>
Cc: oohall@...il.com, maddy@...ux.ibm.com, mpe@...erman.id.au,
        npiggin@...il.com, christophe.leroy@...roup.eu, bhelgaas@...gle.com,
        tpearson@...torengineering.com, linuxppc-dev@...ts.ozlabs.org,
        linux-kernel@...r.kernel.org, vaibhav@...ux.ibm.com,
        sbhat@...ux.ibm.com, ganeshgr@...ux.ibm.com
Subject: Re: [PATCH v1 1/1] powerpc/eeh: fix recursive pci_lock_rescan_remove
 locking in EEH event handling

On 2025-11-19 23:44:18 Wed, Narayana Murty N wrote:
> The recent commit 1010b4c012b0 ("powerpc/eeh: Make EEH driver device
> hotplug safe") restructured the EEH driver to improve synchronization
> with the PCI hotplug layer.
> 
> However, it inadvertently moved pci_lock_rescan_remove() outside its
> intended scope in eeh_handle_normal_event(), leading to broken PCI
> error reporting and improper EEH event triggering. Specifically,
> eeh_handle_normal_event() acquired pci_lock_rescan_remove() before
> calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to
> acquire the same lock internally, causing nested locking and disrupting
> normal EEH event handling paths.
> 
> This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(),
> with two public wrappers:
>     eeh_pe_bus_get() with locking enabled.
>     eeh_pe_bus_get_nolock() that skips locking.
> 
> Callers that already hold pci_lock_rescan_remove() now use
> eeh_pe_bus_get_nolock() to avoid recursive lock acquisition.
> 
> Additionally, pci_lock_rescan_remove() calls are restored to the correct
> position—after eeh_pe_bus_get() and immediately before iterating affected
> PEs and devices. This ensures EEH-triggered PCI removes occur under proper
> bus rescan locking without recursive lock contention.
> 
> The eeh_pe_loc_get() function has been split into two functions:
>     eeh_pe_loc_get(struct eeh_pe *pe) which retrieves the loc for given PE.
>     eeh_pe_loc_get_bus(struct pci_bus *bus) which retrieves the location
>     code for given bus.
> 
> 
[...]
> Changelog:
> V1:
>   * Split eeh_pe_loc_get() into two functions — eeh_pe_loc_get() and
>   * eeh_pe_loc_get_bus() to separate PE-to-bus lookup from PCI bus location
>   * code retrieval, per code review suggestions.
> ---
>  arch/powerpc/include/asm/eeh.h   |  2 +
>  arch/powerpc/kernel/eeh_driver.c | 11 ++---
>  arch/powerpc/kernel/eeh_pe.c     | 74 ++++++++++++++++++++++++++++++--
>  3 files changed, 78 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 5e34611de9ef..b7ebb4ac2c71 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -289,6 +289,8 @@ void eeh_pe_dev_traverse(struct eeh_pe *root,
>  void eeh_pe_restore_bars(struct eeh_pe *pe);
>  const char *eeh_pe_loc_get(struct eeh_pe *pe);
>  struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
> +const char *eeh_pe_loc_get_bus(struct pci_bus *bus);
> +struct pci_bus *eeh_pe_bus_get_nolock(struct eeh_pe *pe);
>  
>  void eeh_show_enabled(void);
>  int __init eeh_init(struct eeh_ops *ops);
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index ef78ff77cf8f..028f69158532 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -846,7 +846,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
>  
>  	pci_lock_rescan_remove();
>  
> -	bus = eeh_pe_bus_get(pe);
> +	bus = eeh_pe_bus_get_nolock(pe);
>  	if (!bus) {
>  		pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n",
>  			__func__, pe->phb->global_number, pe->addr);
> @@ -886,14 +886,15 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
>  	/* Log the event */
>  	if (pe->type & EEH_PE_PHB) {
>  		pr_err("EEH: Recovering PHB#%x, location: %s\n",
> -			pe->phb->global_number, eeh_pe_loc_get(pe));
> +			pe->phb->global_number, eeh_pe_loc_get_bus(bus));
>  	} else {
>  		struct eeh_pe *phb_pe = eeh_phb_pe_get(pe->phb);
>  
>  		pr_err("EEH: Recovering PHB#%x-PE#%x\n",
>  		       pe->phb->global_number, pe->addr);
>  		pr_err("EEH: PE location: %s, PHB location: %s\n",
> -		       eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe));
> +		       eeh_pe_loc_get_bus(bus),
> +		       eeh_pe_loc_get_bus(eeh_pe_bus_get_nolock(phb_pe)));
>  	}
>  
>  #ifdef CONFIG_STACKTRACE
> @@ -1098,7 +1099,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
>  		eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
>  		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>  
> -		bus = eeh_pe_bus_get(pe);
> +		bus = eeh_pe_bus_get_nolock(pe);
>  		if (bus)
>  			pci_hp_remove_devices(bus);
>  		else
> @@ -1222,7 +1223,7 @@ void eeh_handle_special_event(void)
>  				    (phb_pe->state & EEH_PE_RECOVERING))
>  					continue;
>  
> -				bus = eeh_pe_bus_get(phb_pe);
> +				bus = eeh_pe_bus_get_nolock(phb_pe);
>  				if (!bus) {
>  					pr_err("%s: Cannot find PCI bus for "
>  					       "PHB#%x-PE#%x\n",
> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> index e740101fadf3..040e8f69a4aa 100644
> --- a/arch/powerpc/kernel/eeh_pe.c
> +++ b/arch/powerpc/kernel/eeh_pe.c
> @@ -812,6 +812,24 @@ void eeh_pe_restore_bars(struct eeh_pe *pe)
>  const char *eeh_pe_loc_get(struct eeh_pe *pe)
>  {
>  	struct pci_bus *bus = eeh_pe_bus_get(pe);
> +	return eeh_pe_loc_get_bus(bus);
> +}
> +
> +/**
> + * eeh_pe_loc_get_bus - Retrieve location code binding to the given PCI bus
> + * @bus: PCI bus
> + *
> + * Retrieve the location code associated with the given PCI bus. If the bus
> + * is a root bus, the location code is fetched from the PHB device tree node
> + * or root port. Otherwise, the location code is obtained from the device
> + * tree node of the upstream bridge of the bus. The function walks up the
> + * bus hierarchy if necessary, checking each node for the appropriate
> + * location code property ("ibm,io-base-loc-code" for root buses,
> + * "ibm,slot-location-code" for others). If no location code is found,
> + * returns "N/A".
> + */
> +const char *eeh_pe_loc_get_bus(struct pci_bus *bus)
> +{
>  	struct device_node *dn;
>  	const char *loc = NULL;
>  

This looks more cleaner.

Reviewed-by: Mahesh Salgaonkar <mahesh@...ux.ibm.com>

Thanks,
-Mahesh.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ