lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7b472880-32d0-4783-b9d2-3d4230403975@panix.com>
Date: Wed, 26 Feb 2025 07:31:37 -0800
From: Kenneth Crudup <kenny@...ix.com>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: Bjorn Helgaas <helgaas@...nel.org>, ilpo.jarvinen@...ux.intel.com,
 Bjorn Helgaas <bhelgaas@...gle.com>, Jian-Hong Pan <jhp@...lessos.org>,
 linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
 Niklāvs Koļesņikovs <pinkflames.linux@...il.com>,
 Andreas Noever <andreas.noever@...il.com>,
 Michael Jamet <michael.jamet@...el.com>, Lukas Wunner <lukas@...ner.de>,
 Yehezkel Bernat <YehezkelShB@...il.com>, linux-usb@...r.kernel.org,
 Kenneth Crudup <kenny@...ix.com>
Subject: Re: diagnosing resume failures after disconnected USB4 drives (Was:
 Re: PCI/ASPM: Fix L1SS saving (linus/master commit 7507eb3e7bfac))


Trying to do a "control" test before I try out your bisected commit, and 
Lukas' changes, but of course now I can't get it to fail (I'm on Linus' 
master as of this morning (b5799106b4).

I'm using my portable USB4 dock (Plugable TBT4-HUB3C) this time (vs. my 
CalDigit 4 dock) but the same ASMedia USB4-to-NVMe adapter as always; in 
any case everything is PCIe so it shouldn't matter.

I don't normally use "tbauth" (I think that's all done for me via the 
"boltctl" suite) but I grabbed and built the GIT and ran it anyway, for 
good measure.

I'll keep you updated, I'll be at my CalDigit dock soon enough if I 
can't get any failures this morning.

-K

On 2/26/25 00:44, Mika Westerberg wrote:
> Hi Kenneth,
> 
> On Fri, Feb 14, 2025 at 09:39:33AM -0800, Kenneth Crudup wrote:
>>
>> This is excellent news that you were able to reproduce it- I'd figured this
>> regression would have been caught already (as I do remember this working
>> before) and was worried it may have been specific to a particular piece of
>> hardware (or software setup) on my system.
>>
>> I'll see what I can dig up on my end, but as I'm not expert in these
>> subsystems I may not be able to diagnose anything until your return.
> 
> [Back now]
> 
> My git bisect ended up to this commit:
> 
>    9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep")
> 
> Adding Lukas who is the expert.
> 
> My steps to reproduce on Intel Meteor Lake based reference system are:
> 
> 1. Boot the system up, nothing connected.
> 2. Once up, connect Thunderbolt 4 dock and Thunderbolt 3 NVMe in a chain:
> 
>    [Meteor Lake host] <--> [TB 4 dock] <--> [TB 3 NVMe]
> 
> 3. Authorize PCIe tunnels (whatever your distro provides, my buildroot just
>      has the debugging tools so running 'tbauth -r 301')
> 
> 4. Check that the PCIe topology matches the expected (lspci)
> 
> 5. Enter s2idle:
> 
>    # rtcwake -s 30 -mmem
> 
> 6. Once it is suspended, unplug the cable between the host and the dock.
> 
> 7. Wait for the resume to happen.
> 
> Expectation: The system wakes up fine, notices that the TB and PCIe devices
> are gone, stays responsive and usable.
> 
> Actual result: Resume never completes.
> 
> I added "no_console_suspend" to the command line and the did sysrq-w to
> get list of blocked tasks. I've attached it just in case it is needed.
> 
> If I revert the above commit the issue is gone. Now I'm not sure if this is
> exactly the same issue that you are seeing but nevertheless this is kind of
> normal use case so definitely something we should get fixed.
> 
> Lukas, if you need any more information let me know. I can reproduce this
> easily.
> 
>> I also saw some DRM/connected fixes posted to Linus' master so maybe one of
>> them corrects this new display-crash issue (I'm not home on my big monitor
>> to be able to test yet).
>>
>> -Kenny
>>
>> On 2/14/25 08:29, Mika Westerberg wrote:
>>> Hi,
>>>
>>> On Thu, Feb 13, 2025 at 11:19:35AM -0800, Kenneth Crudup wrote:
>>>>
>>>> On 2/13/25 05:59, Mika Westerberg wrote:
>>>>
>>>>> Hi,
>>>>
>>>> As Murphy's would have it, now my crashes are display-driver related (this
>>>> is Xe, but I've also seen it with i915).
>>>>
>>>> Attached here just for the heck of it, but I'll be better testing the NVMe
>>>> enclosure-related failures this weekend. Stay tuned!
>>>
>>> Okay, I checked quickly and no TB related crash there but I was actually
>>> able to reproduce hang when I unplug the device chain during suspend. I did
>>> not yet have time to look into it deeper. I'm sure this has been working
>>> fine in the past as we tested all kinds of topologies including similar to
>>> this.
>>>
>>> I will be out next week for vacation but will continue after that if the
>>> problem is not alraedy solved ;-)
>>>
>>
>> -- 
>> Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County
>> CA

-- 
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange 
County CA


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ