lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 22 Jan 2024 11:53:35 +0100
From: Johan Hovold <johan@...nel.org>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Michael Schaller <michael@...aller.de>,
	Kai-Heng Feng <kai.heng.feng@...onical.com>,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
	regressions@...ts.linux.dev,
	"Maciej W . Rozycki" <macro@...am.me.uk>,
	Ajay Agarwal <ajayagarwal@...gle.com>,
	Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Heiner Kallweit <hkallweit1@...il.com>,
	Johan Hovold <johan+linaro@...nel.org>,
	Bjorn Helgaas <bhelgaas@...gle.com>, stable@...r.kernel.org,
	regressions@...mhuis.info
Subject: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert
 "PCI/ASPM: Remove pcie_aspm_pm_state_change()")

Hi Bjorn,

I never got a reply to this one so resending with updated Subject in
case it got buried in your inbox.

On Mon, Jan 08, 2024 at 09:39:07AM +0100, Johan Hovold wrote:
 
> On Tue, Jan 02, 2024 at 05:25:50PM -0600, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas <bhelgaas@...gle.com>
> > 
> > This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
> > 
> > Michael reported that when attempting to resume from suspend to RAM on ASUS
> > mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
> > ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
> > with no output, followed by a reboot.
> > 
> > Workarounds include:
> > 
> >   - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> >   - Booting with "pcie_aspm=off"
> >   - Booting with "pcie_aspm.policy=performance"
> >   - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
> >     before suspending
> >   - Connecting a USB flash drive
> > 
> > Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> > Reported-by: Michael Schaller <michael@...aller.de>
> > Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
> > Signed-off-by: Bjorn Helgaas <bhelgaas@...gle.com>
> > Cc: <stable@...r.kernel.org>
> > ---
>  
> > +/* @pdev: the root port or switch downstream port */
> > +void pcie_aspm_pm_state_change(struct pci_dev *pdev)
> > +{
> > +	struct pcie_link_state *link = pdev->link_state;
> > +
> > +	if (aspm_disabled || !link)
> > +		return;
> > +	/*
> > +	 * Devices changed PM state, we should recheck if latency
> > +	 * meets all functions' requirement
> > +	 */
> > +	down_read(&pci_bus_sem);
> > +	mutex_lock(&aspm_lock);
> > +	pcie_update_aspm_capable(link->root);
> > +	pcie_config_aspm_path(link);
> > +	mutex_unlock(&aspm_lock);
> > +	up_read(&pci_bus_sem);
> > +}
> 
> This function is now restored in 6.7 final and is called in paths which
> already hold the pci_bus_sem as reported by lockdep (see splat below).
> 
> This can potentially lead to a deadlock and specifically prevents using
> lockdep on Qualcomm platforms.
> 
> Not sure if you want to propagate whether the bus semaphore is held to
> pcie_aspm_pm_state_change() or if there was some alternative to
> restoring this function which should be explored instead.

So to summarise, this patch, which is now commit

	f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")

introduced a regression in 6.7-final for Qualcomm platforms (and some
Intel platforms) similar to the one recently fixed by commit

	f352ce999260 ("PCI: qcom: Fix potential deadlock when enabling ASPM").

Johan


#regzbot introduced: f93e71aea6c6

>    ============================================
>    WARNING: possible recursive locking detected
>    6.7.0 #40 Not tainted
>    --------------------------------------------
>    kworker/u16:5/90 is trying to acquire lock:
>    ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
>    pcieport 0002:00:00.0: PME: Signaling with IRQ 197
>    
>                but task is already holding lock:
>    ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
>    
>                other info that might help us debug this:
>     Possible unsafe locking scenario:
> 
>           CPU0
>           ----
>      lock(pci_bus_sem);
>      lock(pci_bus_sem);
>    
>                 *** DEADLOCK ***
> 
>     May be due to missing lock nesting notation
> 
>    4 locks held by kworker/u16:5/90:
>     #0: ffff06c5c0008d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
>     #1: ffff800081c0bdd0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
>     #2: ffff06c5c0b7d0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach_async_helper+0x3c/0xf4
>     #3: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
>    
>                stack backtrace:
>    CPU: 1 PID: 90 Comm: kworker/u16:5 Not tainted 6.7.0 #40
>    Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
>    Workqueue: events_unbound async_run_entry_fn
>    Call trace:
>     dump_backtrace+0x9c/0x11c
>     show_stack+0x18/0x24
>     dump_stack_lvl+0x60/0xac
>     dump_stack+0x18/0x24
>     print_deadlock_bug+0x25c/0x348
>     __lock_acquire+0x10a4/0x2064
>     lock_acquire+0x1e8/0x318
>     down_read+0x60/0x184
>     pcie_aspm_pm_state_change+0x58/0xdc
>     pci_set_full_power_state+0xa8/0x114
>     pci_set_power_state+0xc4/0x120
>     qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
>     pci_walk_bus+0x64/0xbc
>     qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ