[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFv23Q=5KdqDHYxf9PVO=kq=VqP0LwRaHQ-KnY2taDEkZ9Fueg@mail.gmail.com>
Date: Fri, 27 Sep 2024 15:33:50 +0800
From: AceLan Kao <acelan.kao@...onical.com>
To: Lukas Wunner <lukas@...ner.de>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI: pciehp: Fix system hang on resume after hot-unplug
during suspend
Lukas Wunner <lukas@...ner.de> 於 2024年9月26日 週四 下午9:23寫道:
>
> On Thu, Sep 26, 2024 at 08:59:09PM +0800, Chia-Lin Kao (AceLan) wrote:
> > Remove unnecessary pci_walk_bus() call in pciehp_resume_noirq(). This
> > fixes a system hang that occurs when resuming after a Thunderbolt dock
> > with attached thunderbolt storage is unplugged during system suspend.
> >
> > The PCI core already handles setting the disconnected state for devices
> > under a port during suspend/resume.
>
> Please explain in the commit message where the PCI core does that.
Hi Lukas,
I found my patch doesn't work.
Let me explain the new findings below.
>
> > The redundant bus walk was
> > interfering with proper hardware state detection during resume, causing
> > a system hang when hot-unplugging daisy-chained Thunderbolt devices.
>
> Please explain what "proper hardware state detection" means.
>
> Did you get a hung task stacktrace? If so, please include it in the
> commit message.
>
> If you're getting a system hang, it means there's a deadlock
> involving pci_bus_sem. I don't quite see how that could happen,
> so a more verbose explanation would be appreciated.
I have no good answer for you now.
After enabling some debugging options and debugging lock options, I
still didn't get any message.
It just hangs there while resuming, and nothing could be logged.
Here is my setup
ubuntu@...alhost:~$ lspci -tv
-[0000:00]-+-00.0 Intel Corporation Device 6400
+-02.0 Intel Corporation Lunar Lake [Intel Graphics]
+-04.0 Intel Corporation Device 641d
+-05.0 Intel Corporation Device 645d
+-07.0-[01-38]--
+-07.2-[39-70]----00.0-[3a-70]--+-00.0-[3b]--
| +-01.0-[3c-4d]--
|
+-02.0-[4e-5f]----00.0-[4f-50]----01.0-[50]----00.0 Phison
Electronics Corporation E12 NVMe Controlle
r
| +-03.0-[60-6f]--
| \-04.0-[70]--
This is Dell WD22TB dock
39:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge
[Goshen Ridge 2020] [8086:0b26] (rev 03)
Subsystem: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge
2020] [8086:0000]
Kernel driver in use: pcieport
This is the TBT storage connects to the dock
50:00.0 Non-Volatile memory controller [0108]: Phison Electronics
Corporation E12 NVMe Controller [1987:5012] (rev 01)
Subsystem: Phison Electronics Corporation E12 NVMe Controller [1987:5012]
Kernel driver in use: nvme
Kernel modules: nvme
While resuming, the dock(8086:0b26) resumes first and I found if dock
device run through below 2 functions in pciehp_resume_noirq()
if (pciehp_device_replaced(ctrl)) {
pci_walk_bus(ctrl->pcie->port->subordinate,pci_dev_set_disconnected,
NULL);
pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
}
The system hangs when storage(1987:5012) also calls the 2 functions.
Only one of the 2 devices can enter the if statement, dock or storage,
or the system hangs.
To test it more, I found that mask out pciehp_request() eases the issue.
It may be because it calls irq_wake_thread() and is stuck somewhere.
So, I try to get rid of the irq_wake_thread() by replacing
pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
with
atomic_or(PCI_EXP_SLTSTA_PDC, &ctrl->pending_events);
In this case, only dock(8086:0b26) will be called in pciehp_resume_noirq().
And the tbt storage will be released after pci_power_up() fails.
Do you think this is a feasible solution?
diff --git a/drivers/pci/hotplug/pciehp_core.c
b/drivers/pci/hotplug/pciehp_core.c
index ff458e692fed..56bf23d55c41 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -332,7 +332,7 @@ static int pciehp_resume_noirq(struct pcie_device *dev)
ctrl_dbg(ctrl, "device replaced during system sleep\n");
pci_walk_bus(ctrl->pcie->port->subordinate,
pci_dev_set_disconnected, NULL);
- pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
+ atomic_or(PCI_EXP_SLTSTA_PDC, &ctrl->pending_events);
}
}
>
> Thanks,
>
> Lukas
>
>
> > Fixes: 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep")
> > Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@...onical.com>
> > ---
> > drivers/pci/hotplug/pciehp_core.c | 2 --
> > 1 file changed, 2 deletions(-)
> >
> > diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> > index ff458e692fed..c1c3f7e2bc43 100644
> > --- a/drivers/pci/hotplug/pciehp_core.c
> > +++ b/drivers/pci/hotplug/pciehp_core.c
> > @@ -330,8 +330,6 @@ static int pciehp_resume_noirq(struct pcie_device *dev)
> > */
> > if (pciehp_device_replaced(ctrl)) {
> > ctrl_dbg(ctrl, "device replaced during system sleep\n");
> > - pci_walk_bus(ctrl->pcie->port->subordinate,
> > - pci_dev_set_disconnected, NULL);
> > pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
> > }
> > }
> > --
> > 2.43.0
Powered by blists - more mailing lists