[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250226084404.GM3713119@black.fi.intel.com>
Date: Wed, 26 Feb 2025 10:44:04 +0200
From: Mika Westerberg <mika.westerberg@...ux.intel.com>
To: Kenneth Crudup <kenny@...ix.com>
Cc: Bjorn Helgaas <helgaas@...nel.org>, ilpo.jarvinen@...ux.intel.com,
Bjorn Helgaas <bhelgaas@...gle.com>,
Jian-Hong Pan <jhp@...lessos.org>, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org,
Niklāvs Koļesņikovs <pinkflames.linux@...il.com>,
Andreas Noever <andreas.noever@...il.com>,
Michael Jamet <michael.jamet@...el.com>,
Lukas Wunner <lukas@...ner.de>,
Yehezkel Bernat <YehezkelShB@...il.com>, linux-usb@...r.kernel.org
Subject: Re: diagnosing resume failures after disconnected USB4 drives (Was:
Re: PCI/ASPM: Fix L1SS saving (linus/master commit 7507eb3e7bfac))
Hi Kenneth,
On Fri, Feb 14, 2025 at 09:39:33AM -0800, Kenneth Crudup wrote:
>
> This is excellent news that you were able to reproduce it- I'd figured this
> regression would have been caught already (as I do remember this working
> before) and was worried it may have been specific to a particular piece of
> hardware (or software setup) on my system.
>
> I'll see what I can dig up on my end, but as I'm not expert in these
> subsystems I may not be able to diagnose anything until your return.
[Back now]
My git bisect ended up to this commit:
9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep")
Adding Lukas who is the expert.
My steps to reproduce on Intel Meteor Lake based reference system are:
1. Boot the system up, nothing connected.
2. Once up, connect Thunderbolt 4 dock and Thunderbolt 3 NVMe in a chain:
[Meteor Lake host] <--> [TB 4 dock] <--> [TB 3 NVMe]
3. Authorize PCIe tunnels (whatever your distro provides, my buildroot just
has the debugging tools so running 'tbauth -r 301')
4. Check that the PCIe topology matches the expected (lspci)
5. Enter s2idle:
# rtcwake -s 30 -mmem
6. Once it is suspended, unplug the cable between the host and the dock.
7. Wait for the resume to happen.
Expectation: The system wakes up fine, notices that the TB and PCIe devices
are gone, stays responsive and usable.
Actual result: Resume never completes.
I added "no_console_suspend" to the command line and the did sysrq-w to
get list of blocked tasks. I've attached it just in case it is needed.
If I revert the above commit the issue is gone. Now I'm not sure if this is
exactly the same issue that you are seeing but nevertheless this is kind of
normal use case so definitely something we should get fixed.
Lukas, if you need any more information let me know. I can reproduce this
easily.
> I also saw some DRM/connected fixes posted to Linus' master so maybe one of
> them corrects this new display-crash issue (I'm not home on my big monitor
> to be able to test yet).
>
> -Kenny
>
> On 2/14/25 08:29, Mika Westerberg wrote:
> > Hi,
> >
> > On Thu, Feb 13, 2025 at 11:19:35AM -0800, Kenneth Crudup wrote:
> > >
> > > On 2/13/25 05:59, Mika Westerberg wrote:
> > >
> > > > Hi,
> > >
> > > As Murphy's would have it, now my crashes are display-driver related (this
> > > is Xe, but I've also seen it with i915).
> > >
> > > Attached here just for the heck of it, but I'll be better testing the NVMe
> > > enclosure-related failures this weekend. Stay tuned!
> >
> > Okay, I checked quickly and no TB related crash there but I was actually
> > able to reproduce hang when I unplug the device chain during suspend. I did
> > not yet have time to look into it deeper. I'm sure this has been working
> > fine in the past as we tested all kinds of topologies including similar to
> > this.
> >
> > I will be out next week for vacation but will continue after that if the
> > problem is not alraedy solved ;-)
> >
>
> --
> Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County
> CA
View attachment "6.14-hang-nvme.out" of type "text/plain" (17095 bytes)
Powered by blists - more mailing lists