[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4bbf68fa-6ca2-47a0-9966-6971dabd7a0f@amd.com>
Date: Fri, 20 Sep 2024 20:20:40 -0600
From: Alex Hung <alex.hung@....com>
To: Bob Gill <gillb5@...us.net>, "Dr. David Alan Gilbert"
<linux@...blig.org>, alexander.deucher@....com
Cc: linux-kernel@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: [REGRESSION] Re: AMDGPU 6.11.0 crash, 6.10.0 git bisect log
On 2024-09-20 18:20, Bob Gill wrote:
> Hi. Sorry for the late reply. My config has
> CONFIG_DEBUG_KERNEL_DC=y
>
> I will set it to # CONFIG_DEBUG_KERNEL_DC is not set
Hi Bob,
It seems the below change in a171cce57792 causes the hang when
CONFIG_DEBUG_KERNEL_DC is set.
--- a/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
+++ b/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
@@ -227,7 +227,8 @@ static void init_transmitter_control(struct
bios_parser *bp)
uint8_t frev;
uint8_t crev = 0;
- BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev);
+ if (!BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev))
+ BREAK_TO_DEBUGGER();
If you can help confirm thefollowing fix the hang, I will prepare a
revert patch next week:
* Set CONFIG_DEBUG_KERNEL_DC and revert the above change, i.e.
--- a/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
+++ b/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
@@ -227,8 +227,7 @@ static void init_transmitter_control(struct
bios_parser *bp)
uint8_t frev;
uint8_t crev = 0;
- if (!BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev))
- BREAK_TO_DEBUGGER();
+ BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev);
Thanks a lot
>
> also,
>
> cat /var/log/kern.log | grep VBIOS gives
>
> Sep 15 11:53:43 freedom kernel: [ 16.372684] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 15 13:58:04 freedom kernel: [ 16.705182] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 15 14:20:05 freedom kernel: [ 17.043288] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 15 14:38:23 freedom kernel: [ 16.625105] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 16 09:40:52 freedom kernel: [ 16.780135] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 16 09:52:39 freedom kernel: [ 15.764412] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 16 14:59:23 freedom kernel: [ 16.077181] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 16 19:03:50 freedom kernel: [ 16.613359] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 16 19:18:13 freedom kernel: [ 15.895630] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 16 22:01:53 freedom kernel: [ 15.768717] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 17 09:48:50 freedom kernel: [ 15.758361] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 17 10:31:23 freedom kernel: [ 15.762467] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 18 09:43:12 freedom kernel: [ 16.086531] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 19 09:32:07 freedom kernel: [ 16.034418] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 19 12:04:46 freedom kernel: [ 15.771447] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 19 13:54:41 freedom kernel: [ 15.791940] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 19 15:37:35 freedom kernel: [ 15.749058] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 19 17:25:04 freedom kernel: [ 16.449671] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 19 19:43:06 freedom kernel: [ 16.312367] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 19 21:31:28 freedom kernel: [ 15.864131] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 20 09:12:39 freedom kernel: [ 15.764786] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 20 11:31:36 freedom kernel: [ 17.332211] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 20 13:23:19 freedom kernel: [ 15.759616] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 20 13:45:07 freedom kernel: [ 16.557215] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 20 14:01:17 freedom kernel: [ 16.433437] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 20 14:24:14 freedom kernel: [ 15.770057] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 20 14:47:27 freedom kernel: [ 15.725150] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 20 15:02:31 freedom kernel: [ 16.591276] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
> Sep 20 15:19:44 freedom kernel: [ 15.863542] amdgpu 0000:04:00.0:
> amdgpu: Fetched VBIOS from ROM BAR
>
> so not what quite what you were looking for, but when I run cat kern.log
> | grep Failed :
>
> Sep 15 11:53:43 freedom kernel: [ 25.730013] uvcvideo 1-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 15 13:58:04 freedom kernel: [ 26.025432] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 15 14:38:23 freedom kernel: [ 25.883820] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 16 09:40:52 freedom kernel: [ 27.204539] uvcvideo 1-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 16 14:41:22 freedom kernel: [ 28.985885] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 16 19:03:50 freedom kernel: [ 26.510748] uvcvideo 1-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 17 09:48:50 freedom kernel: [ 25.682372] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 17 10:31:23 freedom kernel: [ 25.547899] uvcvideo 1-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 18 09:43:12 freedom kernel: [ 26.243232] uvcvideo 1-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 19 09:32:07 freedom kernel: [ 25.267332] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 19 12:04:46 freedom kernel: [ 25.269450] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 19 15:37:35 freedom kernel: [ 25.494803] uvcvideo 1-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 19 19:43:06 freedom kernel: [ 26.288598] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 20 09:12:39 freedom kernel: [ 25.291743] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 20 13:23:19 freedom kernel: [ 25.884358] uvcvideo 1-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 20 14:24:14 freedom kernel: [ 25.312379] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 20 14:47:27 freedom kernel: [ 25.352905] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
> Sep 20 15:19:44 freedom kernel: [ 25.297893] uvcvideo 2-5.2:1.1:
> Failed to set UVC probe control : -32 (exp. 26).
>
>
> Hopefully this helps. Please mail me if you for more information. I
> have changed my .config and set
>
> # CONFIG_DEBUG_KERNEL_DC is not set
>
> I am attempting to build 6.11.0.
>
> Thanks,
>
> Bob
>
>
> On 2024-09-20 17:34, Alex Hung wrote:
>>
>>
>> On 2024-09-20 17:00, Dr. David Alan Gilbert wrote:
>>> * Bob Gill (gillb5@...us.net) wrote:
>>>> Hello. Kernel 6.11.0 crashes. 6.10.0 builds. Al Viro and Dr.
>>>> David Alan
>>>> Gilbert have been helpful, and asked that I
>>>>
>>>> post a git bisect log. The last log step seems odd, but the second
>>>> last
>>>> step "Remove useless function call" might be what broke.
>>>
>>> Thank you for doing this!
>>>
>>> My reading is that's fine, I think the next one:
>>>
>>> tree: git bisect bad
>>> [a171cce57792b0a6206d532050179a381ad74f8f] drm/amd/display: Check and
>>> log for
>>> function error codes
>>>
>>> or the one after it is the culprit?
>>>
>>> Adding the two Alex's from AMD back onto the thread.
>>> (Also added the [REGRESSION] marker the notes tell us to add)
>>
>> The commit triggers debugger in case of in case of errors.
>>
>> Is the config CONFIG_DEBUG_KERNEL_DC (Enable kgdb break in DC) enabled
>> in .config, i.e. can you check "grep CONFIG_DEBUG_KERNEL_DC .config"?
>>
>> If so, can you also try to disable it and check whether you can see
>> error messages "Failed to execute VBIOS command table" in kernel log?
>>
>>>
>>>> My hardware is old corei7 quad core/8 thread Tylersberg/Nehalem with
>>>> an AMD
>>>> RX 6500XT. That's the odd combination.
>>>>
>>>> Thanks in advance,
>>>
>>> Thanks again for the bisect.
>>>
>>> Dave
>>>
>>>> Bob
>>>>
>>>> Config: (.config)
>>>> /data/kernel/bobtest6.10-64
>>>>
>>>> Build line: (last command tells me the job is finished)
>>>> make menuconfig && make -j $(nproc) && make modules && make
>>>> modules_install
>>>> && make install && /data/music/pl.sh
>>>>
>>>> Rule 1: Do not modify ANYTHING in the source tree
>>>>
>>>> git bisect start
>>>> git bisect bad
>>>> git bisect good v6.10
>>>>
>>>> Bisecting: 11273 revisions left to test after this (roughly 14 steps)
>>>> [2c9b3512402ed192d1f43f4531fb5da947e72bd0] Merge tag 'for-linus' of
>>>> git://git.kernel.org/pub/scm/virt/kvm/kvm
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0+
>>>>
>>>> RESULT: boot 6.10.0+ fails
>>>> screen black for more than 2 minutes,
>>>> (caps lock key unresponsive, reset, power
>>>> buttons on computer case do nothing). Reset with power bar.
>>>>
>>>> tree: git bisect bad
>>>> Bisecting: 5677 revisions left to test after this (roughly 13 steps)
>>>> [280e36f0d5b997173d014c07484c03a7f7750668] nsfs: use cleanup guard
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0+
>>>>
>>>> RESULT: boot 6.10.0+ successful
>>>> tree: git bisect good
>>>> Bisecting: 2855 revisions left to test after this (roughly 12 steps)
>>>> [dde1a0e1625c08cf4f958348a83434b2ddecf449] Merge tag
>>>> 'x86-percpu-2024-07-17'
>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0+
>>>>
>>>> RESULT: boot 6.10.0+ fails
>>>> screen black for more than 2 minutes,
>>>> (caps lock key unresponsive, reset, power
>>>> buttons on computer case do nothing). Reset with power bar.
>>>>
>>>> tree: git bisect bad
>>>> Bisecting: 1478 revisions left to test after this (roughly 11 steps)
>>>> [32a120f52a4c0121bca8f2328d4680d283693d60] drm/i915/mtl: Skip PLL state
>>>> verification in TBT mode
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0-rc3+
>>>>
>>>> RESULT: boot 6.10.0-rc3+ successful
>>>> tree: git bisect good
>>>> Bisecting: 739 revisions left to test after this (roughly 10 steps)
>>>> [b6a343df46d69070a7073405e470e6348180ea34] drm/amdgpu: initialize GC IP
>>>> v11.5.2
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0-rc3+
>>>>
>>>> RESULT: boot 6.10.0-rc3+ fails
>>>> screen black for more than 2 minutes,
>>>> (caps lock key unresponsive, reset, power
>>>> buttons on computer case do nothing). Reset with power bar.
>>>>
>>>> tree: git bisect bad
>>>> Bisecting: 369 revisions left to test after this (roughly 9 steps)
>>>> [cf1d06ac53a15b83c0a63225606cfe175e33a8a0] accel/ivpu: Increase
>>>> autosuspend
>>>> delay to 100ms on 40xx
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0-rc1+
>>>>
>>>> RESULT: boot 6.10.0-rc1+ successful
>>>> tree: git bisect good
>>>> Bisecting: 184 revisions left to test after this (roughly 8 steps)
>>>> [0ca9f757a0e27a076395ec1b2002661bcf5c25e8] drm/amd/pm: powerplay: Add
>>>> `__counted_by` attribute for flexible arrays
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.9.0-rc5+
>>>>
>>>> RESULT: boot 6.9.0-rc5+ successful
>>>> tree: git bisect good
>>>> Bisecting: 92 revisions left to test after this (roughly 7 steps)
>>>> [9862ef7bae47b9292a38a0a1b30bff7f56d7815b] drm/amd/display: Use
>>>> periodic
>>>> detection for ipx/headless
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0-rc3+
>>>>
>>>> RESULT: boot 6.10.0-rc3+ fails
>>>> screen black for more than 2 minutes,
>>>> (caps lock key unresponsive, reset, power
>>>> buttons on computer case do nothing). Reset with power bar.
>>>>
>>>> tree: git bisect bad
>>>> Bisecting: 44 revisions left to test after this (roughly 6 steps)
>>>> [a78313bb206e0c456a989f380c4cbd8af8af7c76] Merge tag
>>>> 'drm-intel-gt-next-2024-06-12' of
>>>> https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0-rc3+
>>>>
>>>> RESULT: boot 6.10.0-rc3+ successful
>>>> tree: git bisect good
>>>> Bisecting: 22 revisions left to test after this (roughly 5 steps)
>>>> [51dbe0239b1fc7c435867ce28e5eb4394b6641e1] drm/amd/display: Fix
>>>> cursor size
>>>> issues
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0-rc3+
>>>>
>>>> RESULT: boot 6.10.0-rc3+ successful
>>>> tree: git bisect good
>>>> Bisecting: 11 revisions left to test after this (roughly 4 steps)
>>>> [871512e36f9c1c2cb4e62eb860ca0438800e4d63] drm/amd/display: Add
>>>> workaround
>>>> to restrict max frac urgent for DPM0
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0-rc3+
>>>>
>>>> RESULT: boot 6.10.0-rc3+ fails
>>>> screen black for more than 2 minutes,
>>>> (caps lock key unresponsive, reset, power
>>>> buttons on computer case do nothing). Reset with power bar.
>>>>
>>>> tree: git bisect bad
>>>> Bisecting: 5 revisions left to test after this (roughly 3 steps)
>>>> [5d93060d430b359e16e7c555c8f151ead1ac614b] drm/amd/display: Check HDCP
>>>> returned status
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0-rc3+
>>>>
>>>> RESULT: boot 6.10.0-rc3+ fails
>>>> screen black for more than 2 minutes,
>>>> (caps lock key unresponsive, reset, power
>>>> buttons on computer case do nothing). Reset with power bar.
>>>>
>>>> tree: git bisect bad
>>>> Bisecting: 2 revisions left to test after this (roughly 1 step)
>>>> [e094992bd1caa1fbd42221c7c305fc3b54172b5c] drm/amd/display: Remove
>>>> useless
>>>> function call
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0-rc3+
>>>>
>>>> RESULT: boot 6.10.0-rc3+ successful
>>>>
>>>> tree: git bisect good
>>>> [2c2ee1d1329881d8e6bb23c3b9f3b41df8a8055c] drm/amd/display: Check
>>>> and log
>>>> for function error codes
>>>>
>>>> latest kernel:
>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>> 6.10.0-rc3+
>>>>
>>>> RESULT: boot 6.10.0-rc3+ fails
>>>> screen black for more than 2 minutes,
>>>> (caps lock key unresponsive, reset, power
>>>> buttons on computer case do nothing). Reset with power bar.
>>>>
>>>> tree: git bisect bad
>>>> [a171cce57792b0a6206d532050179a381ad74f8f] drm/amd/display: Check
>>>> and log
>>>> for function error codes
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
Powered by blists - more mailing lists