[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7b472880-32d0-4783-b9d2-3d4230403975@panix.com>
Date: Wed, 26 Feb 2025 07:31:37 -0800
From: Kenneth Crudup <kenny@...ix.com>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: Bjorn Helgaas <helgaas@...nel.org>, ilpo.jarvinen@...ux.intel.com,
Bjorn Helgaas <bhelgaas@...gle.com>, Jian-Hong Pan <jhp@...lessos.org>,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
Niklāvs Koļesņikovs <pinkflames.linux@...il.com>,
Andreas Noever <andreas.noever@...il.com>,
Michael Jamet <michael.jamet@...el.com>, Lukas Wunner <lukas@...ner.de>,
Yehezkel Bernat <YehezkelShB@...il.com>, linux-usb@...r.kernel.org,
Kenneth Crudup <kenny@...ix.com>
Subject: Re: diagnosing resume failures after disconnected USB4 drives (Was:
Re: PCI/ASPM: Fix L1SS saving (linus/master commit 7507eb3e7bfac))
Trying to do a "control" test before I try out your bisected commit, and
Lukas' changes, but of course now I can't get it to fail (I'm on Linus'
master as of this morning (b5799106b4).
I'm using my portable USB4 dock (Plugable TBT4-HUB3C) this time (vs. my
CalDigit 4 dock) but the same ASMedia USB4-to-NVMe adapter as always; in
any case everything is PCIe so it shouldn't matter.
I don't normally use "tbauth" (I think that's all done for me via the
"boltctl" suite) but I grabbed and built the GIT and ran it anyway, for
good measure.
I'll keep you updated, I'll be at my CalDigit dock soon enough if I
can't get any failures this morning.
-K
On 2/26/25 00:44, Mika Westerberg wrote:
> Hi Kenneth,
>
> On Fri, Feb 14, 2025 at 09:39:33AM -0800, Kenneth Crudup wrote:
>>
>> This is excellent news that you were able to reproduce it- I'd figured this
>> regression would have been caught already (as I do remember this working
>> before) and was worried it may have been specific to a particular piece of
>> hardware (or software setup) on my system.
>>
>> I'll see what I can dig up on my end, but as I'm not expert in these
>> subsystems I may not be able to diagnose anything until your return.
>
> [Back now]
>
> My git bisect ended up to this commit:
>
> 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep")
>
> Adding Lukas who is the expert.
>
> My steps to reproduce on Intel Meteor Lake based reference system are:
>
> 1. Boot the system up, nothing connected.
> 2. Once up, connect Thunderbolt 4 dock and Thunderbolt 3 NVMe in a chain:
>
> [Meteor Lake host] <--> [TB 4 dock] <--> [TB 3 NVMe]
>
> 3. Authorize PCIe tunnels (whatever your distro provides, my buildroot just
> has the debugging tools so running 'tbauth -r 301')
>
> 4. Check that the PCIe topology matches the expected (lspci)
>
> 5. Enter s2idle:
>
> # rtcwake -s 30 -mmem
>
> 6. Once it is suspended, unplug the cable between the host and the dock.
>
> 7. Wait for the resume to happen.
>
> Expectation: The system wakes up fine, notices that the TB and PCIe devices
> are gone, stays responsive and usable.
>
> Actual result: Resume never completes.
>
> I added "no_console_suspend" to the command line and the did sysrq-w to
> get list of blocked tasks. I've attached it just in case it is needed.
>
> If I revert the above commit the issue is gone. Now I'm not sure if this is
> exactly the same issue that you are seeing but nevertheless this is kind of
> normal use case so definitely something we should get fixed.
>
> Lukas, if you need any more information let me know. I can reproduce this
> easily.
>
>> I also saw some DRM/connected fixes posted to Linus' master so maybe one of
>> them corrects this new display-crash issue (I'm not home on my big monitor
>> to be able to test yet).
>>
>> -Kenny
>>
>> On 2/14/25 08:29, Mika Westerberg wrote:
>>> Hi,
>>>
>>> On Thu, Feb 13, 2025 at 11:19:35AM -0800, Kenneth Crudup wrote:
>>>>
>>>> On 2/13/25 05:59, Mika Westerberg wrote:
>>>>
>>>>> Hi,
>>>>
>>>> As Murphy's would have it, now my crashes are display-driver related (this
>>>> is Xe, but I've also seen it with i915).
>>>>
>>>> Attached here just for the heck of it, but I'll be better testing the NVMe
>>>> enclosure-related failures this weekend. Stay tuned!
>>>
>>> Okay, I checked quickly and no TB related crash there but I was actually
>>> able to reproduce hang when I unplug the device chain during suspend. I did
>>> not yet have time to look into it deeper. I'm sure this has been working
>>> fine in the past as we tested all kinds of topologies including similar to
>>> this.
>>>
>>> I will be out next week for vacation but will continue after that if the
>>> problem is not alraedy solved ;-)
>>>
>>
>> --
>> Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County
>> CA
--
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange
County CA
Powered by blists - more mailing lists