linux-kernel - Re: [PATCH] PCI: pciehp: Fix system hang on resume after hot-unplug during suspend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZvZ61srt3QAca2AI@wunner.de>
Date: Fri, 27 Sep 2024 11:28:54 +0200
From: Lukas Wunner <lukas@...ner.de>
To: AceLan Kao <acelan.kao@...onical.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI: pciehp: Fix system hang on resume after hot-unplug
 during suspend

On Fri, Sep 27, 2024 at 03:33:50PM +0800, AceLan Kao wrote:
> Lukas Wunner <lukas@...ner.de> 2024-9-26 9:23
> > On Thu, Sep 26, 2024 at 08:59:09PM +0800, Chia-Lin Kao (AceLan) wrote:
> > > Remove unnecessary pci_walk_bus() call in pciehp_resume_noirq(). This
> > > fixes a system hang that occurs when resuming after a Thunderbolt dock
> > > with attached thunderbolt storage is unplugged during system suspend.
> > >
> > > The PCI core already handles setting the disconnected state for devices
> > > under a port during suspend/resume.
> > > 
> > > The redundant bus walk was
> > > interfering with proper hardware state detection during resume, causing
> > > a system hang when hot-unplugging daisy-chained Thunderbolt devices.
> 
> I have no good answer for you now.
> After enabling some debugging options and debugging lock options, I
> still didn't get any message.

Have you tried "no_console_suspend" on the kernel command line?

> ubuntu@...alhost:~$ lspci -tv
> -[0000:00]-+-00.0  Intel Corporation Device 6400
>           +-02.0  Intel Corporation Lunar Lake [Intel Graphics]
>           +-04.0  Intel Corporation Device 641d
>           +-05.0  Intel Corporation Device 645d
>           +-07.0-[01-38]--
>           +-07.2-[39-70]----00.0-[3a-70]--+-00.0-[3b]--
>           |                               +-01.0-[3c-4d]--
>           |                               +-02.0-[4e-5f]----00.0-[4f-50]----01.0-[50]----00.0  Phison Electronics Corporation E12 NVMe Controller
>           |                               +-03.0-[60-6f]--
>           |                               \-04.0-[70]--
> 
> This is Dell WD22TB dock
> 39:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] [8086:0b26] (rev 03)
>        Subsystem: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] [8086:0000]
> 
> This is the TBT storage connects to the dock
> 50:00.0 Non-Volatile memory controller [0108]: Phison Electronics
> Corporation E12 NVMe Controller [1987:5012] (rev 01)
>        Subsystem: Phison Electronics Corporation E12 NVMe Controller [1987:5012]
>        Kernel driver in use: nvme
>        Kernel modules: nvme

The lspci output shows another PCIe switch in-between the WD22TB dock and
the NVMe drive (bus 4e and 4f).  Is that another Thunderbolt device?
Or is the NVMe drive built into the WD22TB dock and the switch at bus
4e and 4f is a non-Thunderbolt PCIe switch in the dock?

I realize now that commit 9d573d19547b ("PCI: pciehp: Detect device
replacement during system sleep") is a little overzealous because it
not only reacts to *replaced* devices but also to *unplugged* devices:
If the device was unplugged, reading the vendor and device ID returns
0xffff, which is different from the cached value, so the device is
assumed to have been replaced even though it's actually been unplugged.

The device replacement check runs in the ->resume_noirq phase.  Later on
in the ->resume phase, pciehp_resume() calls pciehp_check_presence() to
check for unplugged devices.  Commit 9d573d19547b inadvertantly reacts
before pciehp_check_presence() gets a chance to react.  So that's something
that we should probably change.

I'm not sure though why that would call a hang.  But there is a known issue
that a deadlock may occur when hot-removing nested PCIe switches (which is
what you've got here).  Keith Busch recently re-discovered the issue.
You may want to try if the hang goes away if you apply this patch:

https://lore.kernel.org/all/20240612181625.3604512-2-kbusch@meta.com/

If it does go away then at least we know what the root cause is.

The patch is a bit hackish, but there's an ongoing effort to tackle the
problem more thoroughly:

https://lore.kernel.org/all/20240722151936.1452299-1-kbusch@meta.com/
https://lore.kernel.org/all/20240827192826.710031-1-kbusch@meta.com/

Thanks,

Lukas