[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <609d594a-62e8-44ed-9cc2-585f9bf5ef70@telus.net>
Date: Fri, 20 Sep 2024 21:51:50 -0600
From: Bob Gill <gillb5@...us.net>
To: Alex Hung <alex.hung@....com>, "Dr. David Alan Gilbert"
<linux@...blig.org>, alexander.deucher@....com
Cc: linux-kernel@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: [REGRESSION] Re: AMDGPU 6.11.0 crash, 6.10.0 git bisect log
So the final change:
CONFIG_DEBUG_KERNEL_DC=y
(about line 227) of drivers/gpu/drm/amd/display/dc/bios/command_table2.c
BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev);
with the 6.11.0 kernel,
and the X server is working OK.
Thanks,
Bob
On 2024-09-20 20:20, Alex Hung wrote:
>
>
> On 2024-09-20 18:20, Bob Gill wrote:
>> Hi. Sorry for the late reply. My config has
>> CONFIG_DEBUG_KERNEL_DC=y
>>
>> I will set it to # CONFIG_DEBUG_KERNEL_DC is not set
>
> Hi Bob,
>
> It seems the below change in a171cce57792 causes the hang when
> CONFIG_DEBUG_KERNEL_DC is set.
>
> --- a/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
> +++ b/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
> @@ -227,7 +227,8 @@ static void init_transmitter_control(struct
> bios_parser *bp)
> uint8_t frev;
> uint8_t crev = 0;
>
> - BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev);
> + if (!BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev))
> + BREAK_TO_DEBUGGER();
>
> If you can help confirm thefollowing fix the hang, I will prepare a
> revert patch next week:
>
> * Set CONFIG_DEBUG_KERNEL_DC and revert the above change, i.e.
>
> --- a/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
> +++ b/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
> @@ -227,8 +227,7 @@ static void init_transmitter_control(struct
> bios_parser *bp)
> uint8_t frev;
> uint8_t crev = 0;
>
> - if (!BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev))
> - BREAK_TO_DEBUGGER();
> + BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev);
>
>
> Thanks a lot
>
>>
>> also,
>>
>> cat /var/log/kern.log | grep VBIOS gives
>>
>> Sep 15 11:53:43 freedom kernel: [ 16.372684] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 15 13:58:04 freedom kernel: [ 16.705182] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 15 14:20:05 freedom kernel: [ 17.043288] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 15 14:38:23 freedom kernel: [ 16.625105] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 09:40:52 freedom kernel: [ 16.780135] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 09:52:39 freedom kernel: [ 15.764412] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 14:59:23 freedom kernel: [ 16.077181] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 19:03:50 freedom kernel: [ 16.613359] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 19:18:13 freedom kernel: [ 15.895630] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 22:01:53 freedom kernel: [ 15.768717] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 17 09:48:50 freedom kernel: [ 15.758361] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 17 10:31:23 freedom kernel: [ 15.762467] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 18 09:43:12 freedom kernel: [ 16.086531] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 09:32:07 freedom kernel: [ 16.034418] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 12:04:46 freedom kernel: [ 15.771447] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 13:54:41 freedom kernel: [ 15.791940] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 15:37:35 freedom kernel: [ 15.749058] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 17:25:04 freedom kernel: [ 16.449671] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 19:43:06 freedom kernel: [ 16.312367] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 21:31:28 freedom kernel: [ 15.864131] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 09:12:39 freedom kernel: [ 15.764786] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 11:31:36 freedom kernel: [ 17.332211] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 13:23:19 freedom kernel: [ 15.759616] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 13:45:07 freedom kernel: [ 16.557215] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 14:01:17 freedom kernel: [ 16.433437] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 14:24:14 freedom kernel: [ 15.770057] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 14:47:27 freedom kernel: [ 15.725150] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 15:02:31 freedom kernel: [ 16.591276] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 15:19:44 freedom kernel: [ 15.863542] amdgpu 0000:04:00.0:
>> amdgpu: Fetched VBIOS from ROM BAR
>>
>> so not what quite what you were looking for, but when I run cat
>> kern.log | grep Failed :
>>
>> Sep 15 11:53:43 freedom kernel: [ 25.730013] uvcvideo 1-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 15 13:58:04 freedom kernel: [ 26.025432] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 15 14:38:23 freedom kernel: [ 25.883820] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 16 09:40:52 freedom kernel: [ 27.204539] uvcvideo 1-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 16 14:41:22 freedom kernel: [ 28.985885] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 16 19:03:50 freedom kernel: [ 26.510748] uvcvideo 1-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 17 09:48:50 freedom kernel: [ 25.682372] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 17 10:31:23 freedom kernel: [ 25.547899] uvcvideo 1-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 18 09:43:12 freedom kernel: [ 26.243232] uvcvideo 1-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 19 09:32:07 freedom kernel: [ 25.267332] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 19 12:04:46 freedom kernel: [ 25.269450] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 19 15:37:35 freedom kernel: [ 25.494803] uvcvideo 1-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 19 19:43:06 freedom kernel: [ 26.288598] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 20 09:12:39 freedom kernel: [ 25.291743] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 20 13:23:19 freedom kernel: [ 25.884358] uvcvideo 1-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 20 14:24:14 freedom kernel: [ 25.312379] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 20 14:47:27 freedom kernel: [ 25.352905] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 20 15:19:44 freedom kernel: [ 25.297893] uvcvideo 2-5.2:1.1:
>> Failed to set UVC probe control : -32 (exp. 26).
>>
>>
>> Hopefully this helps. Please mail me if you for more information. I
>> have changed my .config and set
>>
>> # CONFIG_DEBUG_KERNEL_DC is not set
>>
>> I am attempting to build 6.11.0.
>>
>> Thanks,
>>
>> Bob
>>
>>
>> On 2024-09-20 17:34, Alex Hung wrote:
>>>
>>>
>>> On 2024-09-20 17:00, Dr. David Alan Gilbert wrote:
>>>> * Bob Gill (gillb5@...us.net) wrote:
>>>>> Hello. Kernel 6.11.0 crashes. 6.10.0 builds. Al Viro and Dr.
>>>>> David Alan
>>>>> Gilbert have been helpful, and asked that I
>>>>>
>>>>> post a git bisect log. The last log step seems odd, but the
>>>>> second last
>>>>> step "Remove useless function call" might be what broke.
>>>>
>>>> Thank you for doing this!
>>>>
>>>> My reading is that's fine, I think the next one:
>>>>
>>>> tree: git bisect bad
>>>> [a171cce57792b0a6206d532050179a381ad74f8f] drm/amd/display: Check
>>>> and log for
>>>> function error codes
>>>>
>>>> or the one after it is the culprit?
>>>>
>>>> Adding the two Alex's from AMD back onto the thread.
>>>> (Also added the [REGRESSION] marker the notes tell us to add)
>>>
>>> The commit triggers debugger in case of in case of errors.
>>>
>>> Is the config CONFIG_DEBUG_KERNEL_DC (Enable kgdb break in DC)
>>> enabled in .config, i.e. can you check "grep CONFIG_DEBUG_KERNEL_DC
>>> .config"?
>>>
>>> If so, can you also try to disable it and check whether you can see
>>> error messages "Failed to execute VBIOS command table" in kernel log?
>>>
>>>>
>>>>> My hardware is old corei7 quad core/8 thread Tylersberg/Nehalem
>>>>> with an AMD
>>>>> RX 6500XT. That's the odd combination.
>>>>>
>>>>> Thanks in advance,
>>>>
>>>> Thanks again for the bisect.
>>>>
>>>> Dave
>>>>
>>>>> Bob
>>>>>
>>>>> Config: (.config)
>>>>> /data/kernel/bobtest6.10-64
>>>>>
>>>>> Build line: (last command tells me the job is finished)
>>>>> make menuconfig && make -j $(nproc) && make modules && make
>>>>> modules_install
>>>>> && make install && /data/music/pl.sh
>>>>>
>>>>> Rule 1: Do not modify ANYTHING in the source tree
>>>>>
>>>>> git bisect start
>>>>> git bisect bad
>>>>> git bisect good v6.10
>>>>>
>>>>> Bisecting: 11273 revisions left to test after this (roughly 14 steps)
>>>>> [2c9b3512402ed192d1f43f4531fb5da947e72bd0] Merge tag 'for-linus' of
>>>>> git://git.kernel.org/pub/scm/virt/kvm/kvm
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0+
>>>>>
>>>>> RESULT: boot 6.10.0+ fails
>>>>> screen black for more than 2 minutes,
>>>>> (caps lock key unresponsive, reset, power
>>>>> buttons on computer case do nothing). Reset with power
>>>>> bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 5677 revisions left to test after this (roughly 13 steps)
>>>>> [280e36f0d5b997173d014c07484c03a7f7750668] nsfs: use cleanup guard
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0+
>>>>>
>>>>> RESULT: boot 6.10.0+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 2855 revisions left to test after this (roughly 12 steps)
>>>>> [dde1a0e1625c08cf4f958348a83434b2ddecf449] Merge tag
>>>>> 'x86-percpu-2024-07-17'
>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0+
>>>>>
>>>>> RESULT: boot 6.10.0+ fails
>>>>> screen black for more than 2 minutes,
>>>>> (caps lock key unresponsive, reset, power
>>>>> buttons on computer case do nothing). Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 1478 revisions left to test after this (roughly 11 steps)
>>>>> [32a120f52a4c0121bca8f2328d4680d283693d60] drm/i915/mtl: Skip PLL
>>>>> state
>>>>> verification in TBT mode
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 739 revisions left to test after this (roughly 10 steps)
>>>>> [b6a343df46d69070a7073405e470e6348180ea34] drm/amdgpu: initialize
>>>>> GC IP
>>>>> v11.5.2
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ fails
>>>>> screen black for more than 2 minutes,
>>>>> (caps lock key unresponsive, reset, power
>>>>> buttons on computer case do nothing). Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 369 revisions left to test after this (roughly 9 steps)
>>>>> [cf1d06ac53a15b83c0a63225606cfe175e33a8a0] accel/ivpu: Increase
>>>>> autosuspend
>>>>> delay to 100ms on 40xx
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc1+
>>>>>
>>>>> RESULT: boot 6.10.0-rc1+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 184 revisions left to test after this (roughly 8 steps)
>>>>> [0ca9f757a0e27a076395ec1b2002661bcf5c25e8] drm/amd/pm: powerplay: Add
>>>>> `__counted_by` attribute for flexible arrays
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.9.0-rc5+
>>>>>
>>>>> RESULT: boot 6.9.0-rc5+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 92 revisions left to test after this (roughly 7 steps)
>>>>> [9862ef7bae47b9292a38a0a1b30bff7f56d7815b] drm/amd/display: Use
>>>>> periodic
>>>>> detection for ipx/headless
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ fails
>>>>> screen black for more than 2 minutes,
>>>>> (caps lock key unresponsive, reset, power
>>>>> buttons on computer case do nothing). Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 44 revisions left to test after this (roughly 6 steps)
>>>>> [a78313bb206e0c456a989f380c4cbd8af8af7c76] Merge tag
>>>>> 'drm-intel-gt-next-2024-06-12' of
>>>>> https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 22 revisions left to test after this (roughly 5 steps)
>>>>> [51dbe0239b1fc7c435867ce28e5eb4394b6641e1] drm/amd/display: Fix
>>>>> cursor size
>>>>> issues
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 11 revisions left to test after this (roughly 4 steps)
>>>>> [871512e36f9c1c2cb4e62eb860ca0438800e4d63] drm/amd/display: Add
>>>>> workaround
>>>>> to restrict max frac urgent for DPM0
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ fails
>>>>> screen black for more than 2 minutes,
>>>>> (caps lock key unresponsive, reset, power
>>>>> buttons on computer case do nothing). Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 5 revisions left to test after this (roughly 3 steps)
>>>>> [5d93060d430b359e16e7c555c8f151ead1ac614b] drm/amd/display: Check
>>>>> HDCP
>>>>> returned status
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ fails
>>>>> screen black for more than 2 minutes,
>>>>> (caps lock key unresponsive, reset, power
>>>>> buttons on computer case do nothing). Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 2 revisions left to test after this (roughly 1 step)
>>>>> [e094992bd1caa1fbd42221c7c305fc3b54172b5c] drm/amd/display: Remove
>>>>> useless
>>>>> function call
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ successful
>>>>>
>>>>> tree: git bisect good
>>>>> [2c2ee1d1329881d8e6bb23c3b9f3b41df8a8055c] drm/amd/display: Check
>>>>> and log
>>>>> for function error codes
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ fails
>>>>> screen black for more than 2 minutes,
>>>>> (caps lock key unresponsive, reset, power
>>>>> buttons on computer case do nothing). Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> [a171cce57792b0a6206d532050179a381ad74f8f] drm/amd/display: Check
>>>>> and log
>>>>> for function error codes
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
Powered by blists - more mailing lists