linux-kernel - Re: diagnosing resume failures after disconnected USB4 drives (Was: Re: PCI/ASPM: Fix L1SS saving (linus/master commit 7507eb3e7bfac))

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20250307103456.GX3713119@black.fi.intel.com>
Date: Fri, 7 Mar 2025 12:34:56 +0200
From: Mika Westerberg <mika.westerberg@...ux.intel.com>
To: Lukas Wunner <lukas@...ner.de>
Cc: Kenneth Crudup <kenny@...ix.com>, Bjorn Helgaas <helgaas@...nel.org>,
	ilpo.jarvinen@...ux.intel.com, Bjorn Helgaas <bhelgaas@...gle.com>,
	Jian-Hong Pan <jhp@...lessos.org>, linux-pci@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Nikl??vs Ko??es??ikovs <pinkflames.linux@...il.com>,
	Andreas Noever <andreas.noever@...il.com>,
	Michael Jamet <michael.jamet@...el.com>,
	Yehezkel Bernat <YehezkelShB@...il.com>, linux-usb@...r.kernel.org
Subject: Re: diagnosing resume failures after disconnected USB4 drives (Was:
 Re: PCI/ASPM: Fix L1SS saving (linus/master commit 7507eb3e7bfac))

On Thu, Mar 06, 2025 at 05:45:23PM +0100, Lukas Wunner wrote:
> On Tue, Mar 04, 2025 at 10:23:14AM +0200, Mika Westerberg wrote:
> > Unfortunately I still see the same hang. I double checked, with revert the
> > problem goes a way and with this patch I still see it.
> > 
> > Steps:
> > 
> > 1. Boot the system, nothing connected.
> > 2. Connect TBT 4 dock to the host.
> > 3. Connect TBT 3 NVMe to the TBT4 doc.
> > 4. Authorize both PCIe tunnels, verify devices are there.
> > 5. Enter s2idle.
> > 6. Unplug the TBT 4 dock from the host.
> > 7. Exit s2idle.
> 
> Thanks for testing.  Would you mind giving the below a spin?

Sure.

> I've realized this can likely be solved in a much easier way:
> 
> The ->resume_noirq callback is invoked while traversing down
> the hierarchy and the topmost slot which detects device replacement
> already marks everything below as disconnected.  Hence any nested
> hotplug ports can just skip the replacement check because they're
> disconnected as well.

Makes sense.

Tried the patch now and it solves the issue. Thanks!

Tested-by: Mika Westerberg <mika.westerberg@...ux.intel.com>

> 
> -- >8 --
> 
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index ff458e6..997841c 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -286,9 +286,12 @@ static int pciehp_suspend(struct pcie_device *dev)
>  
>  static bool pciehp_device_replaced(struct controller *ctrl)
>  {
> -	struct pci_dev *pdev __free(pci_dev_put);
> +	struct pci_dev *pdev __free(pci_dev_put) = NULL;
>  	u32 reg;
>  
> +	if (pci_dev_is_disconnected(ctrl->pcie->port))
> +		return false;
> +
>  	pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
>  	if (!pdev)
>  		return true;