lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZvvW1ua2UjwHIOEN@wunner.de>
Date: Tue, 1 Oct 2024 13:02:46 +0200
From: Lukas Wunner <lukas@...ner.de>
To: AceLan Kao <acelan.kao@...onical.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI: pciehp: Fix system hang on resume after hot-unplug
 during suspend

On Mon, Sep 30, 2024 at 09:31:53AM +0800, AceLan Kao wrote:
> Lukas Wunner <lukas@...ner.de> 2024 9 28 8:51:
> > -       if (pci_get_dsn(pdev) != ctrl->dsn)
> > +       dsn = pci_get_dsn(pdev);
> > +       if (!PCI_POSSIBLE_ERROR(dsn) &&
> > +           dsn != ctrl->dsn)
> >                 return true;
> 
> In my case, the pciehp_device_replaced() returns true from this final check.
> And these are the values I got
> dsn = 0x00000000, ctrl->dsn = 0x7800AA00
> dsn = 0x00000000, ctrl->dsn = 0x21B7D000

Ah because pci_get_dsn() returns 0 if the device is gone.
Below is a modified patch which returns false in that case.

I've only changed:
-	dsn = pci_get_dsn(pdev);
-	if (!PCI_POSSIBLE_ERROR(dsn) &&
+	if ((dsn = pci_get_dsn(pdev)) &&
+	    !PCI_POSSIBLE_ERROR(dsn) &&


> Did some other test
> TBT HDD -> TBT dock -> laptop
>    suspend
> TBT HDD -> laptop(replace TBT dock with the TBT HDD)
>    resume
> Got the same result as above, looks like it didn't detect the TBT dock
> has been replaced by TBT HDD.
> 
> In the origin call trace, unplug TBT dock or replace it with TBT HDD,
> it returns true by the below check
>         if (pci_read_config_dword(pdev, PCI_VENDOR_ID, &reg) ||
>            reg != (pdev->vendor | (pdev->device << 16)) ||
>            pci_read_config_dword(pdev, PCI_CLASS_REVISION, &reg) ||
>            reg != (pdev->revision | (pdev->class << 8)))
>                return true;

Hm, that's odd.  Why is that?  Is reg == 0xffffffff in one of those cases?

I guess that could happen if the Thunderbolt tunnels are not yet
established at that point (i.e. in the ->resume_noirq phase),
but normally they should be.  Does this system use ICM-controlled
tunnel management or kernel-native (software-controlled) tunnel
management?

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ