[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 5 May 2024 07:37:09 -0500
From: Mario Limonciello <superm1@...il.com>
To: Linux regressions mailing list <regressions@...ts.linux.dev>,
Micha Albert <kernel@...ha.zone>
Cc: "stable@...r.kernel.org" <stable@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Mario Limonciello <mario.limonciello@....com>
Subject: Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU
Disconnection from 6.8.7=>6.8.8
On 5/4/24 23:59, Linux regression tracking (Thorsten Leemhuis) wrote:
> [CCing Mario, who asked for the two suspected commits to be backported]
>
> On 05.05.24 03:12, Micha Albert wrote:
>>
>> I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board.
>> In 6.8.7, this works as expected, and my Plymouth screen (including the
>> LUKS password prompt) shows on my 2 monitors connected to the GPU as
>> well as my main laptop screen. Upon entering the password, I'm put into
>> userspace as expected. However, upon upgrading to 6.8.8, I will be
>> greeted with the regular password prompt, but after entering my password
>> and waiting for it to be accepted, my eGPU will reset and not function.
>> I can tell that it resets since I can hear the click of my ATX power
>> supply turning off and on again, and the status LED of the eGPU board
>> goes from green to blue and back to green, all in less than a second.
>>
>> I talked to a friend, and we found out that the kernel parameter
>> thunderbolt.host_reset=false fixes the issue. He also thinks that
>> commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look
>> suspicious. I've attached the output of dmesg when the error was
>> occurring, since I'm still able to use my laptop normally when this
>> happens, just not with my eGPU and its connected displays.
>
> Thx for the report. Could you please test if 6.9-rc6 (or a later
> snapshot; or -rc7, which should be out in about ~18 hours) is affected
> as well? That would be really important to know.
>
> It would also be great if you could try reverting the two patches you
> mentioned and see if they are really what's causing this. There iirc are
> two more; maybe you might need to revert some or all of them in the
> order they were applied.
There are two other things that I think would be good to understand this
issue.
1) Is it related to trusted devices handling?
You can try to apply it both to 6.8.y or to 6.9-rc.
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=iommu/fixes&id=0f91d0795741c12cee200667648669a91b568735
2) Is it because you have amdgpu in your initramfs but not thunderbolt?
If so; there's very likely an ordering issue.
[ 2.325788] [drm] GPU posting now...
[ 30.360701] ACPI: bus type thunderbolt registered
Can you remove amdgpu from your initramfs and wait for it to startup
after you pivot rootfs? Does this still happen?
>
> Ciao, Thorsten
>
> P.s.: To be sure the issue doesn't fall through the cracks unnoticed,
> I'm adding it to regzbot, the Linux kernel regression tracking bot:
>
> #regzbot ^introduced v6.8.7..v6.8.8
> #regzbot title thunderbolt: eGPU disconnected during boot
>
Powered by blists - more mailing lists