lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <609d594a-62e8-44ed-9cc2-585f9bf5ef70@telus.net>
Date: Fri, 20 Sep 2024 21:51:50 -0600
From: Bob Gill <gillb5@...us.net>
To: Alex Hung <alex.hung@....com>, "Dr. David Alan Gilbert"
 <linux@...blig.org>, alexander.deucher@....com
Cc: linux-kernel@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: [REGRESSION] Re: AMDGPU 6.11.0 crash, 6.10.0 git bisect log

So the final change:

CONFIG_DEBUG_KERNEL_DC=y

(about line 227) of drivers/gpu/drm/amd/display/dc/bios/command_table2.c

BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev);

with the 6.11.0 kernel,

and the X server is working OK.

Thanks,

Bob

On 2024-09-20 20:20, Alex Hung wrote:
>
>
> On 2024-09-20 18:20, Bob Gill wrote:
>> Hi.  Sorry for the late reply.  My config has
>> CONFIG_DEBUG_KERNEL_DC=y
>>
>> I will set it to # CONFIG_DEBUG_KERNEL_DC is not set
>
> Hi Bob,
>
> It seems the below change in a171cce57792 causes the hang when 
> CONFIG_DEBUG_KERNEL_DC is set.
>
> --- a/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
> +++ b/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
> @@ -227,7 +227,8 @@ static void init_transmitter_control(struct 
> bios_parser *bp)
>         uint8_t frev;
>         uint8_t crev = 0;
>
> -       BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev);
> +       if (!BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev))
> +               BREAK_TO_DEBUGGER();
>
> If you can help confirm thefollowing fix the hang, I will prepare a 
> revert patch next week:
>
> * Set CONFIG_DEBUG_KERNEL_DC and revert the above change, i.e.
>
> --- a/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
> +++ b/drivers/gpu/drm/amd/display/dc/bios/command_table2.c
> @@ -227,8 +227,7 @@ static void init_transmitter_control(struct 
> bios_parser *bp)
>         uint8_t frev;
>         uint8_t crev = 0;
>
> -       if (!BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev))
> -               BREAK_TO_DEBUGGER();
> +       BIOS_CMD_TABLE_REVISION(dig1transmittercontrol, frev, crev);
>
>
> Thanks a lot
>
>>
>> also,
>>
>> cat /var/log/kern.log | grep VBIOS       gives
>>
>> Sep 15 11:53:43 freedom kernel: [   16.372684] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 15 13:58:04 freedom kernel: [   16.705182] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 15 14:20:05 freedom kernel: [   17.043288] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 15 14:38:23 freedom kernel: [   16.625105] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 09:40:52 freedom kernel: [   16.780135] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 09:52:39 freedom kernel: [   15.764412] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 14:59:23 freedom kernel: [   16.077181] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 19:03:50 freedom kernel: [   16.613359] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 19:18:13 freedom kernel: [   15.895630] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 16 22:01:53 freedom kernel: [   15.768717] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 17 09:48:50 freedom kernel: [   15.758361] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 17 10:31:23 freedom kernel: [   15.762467] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 18 09:43:12 freedom kernel: [   16.086531] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 09:32:07 freedom kernel: [   16.034418] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 12:04:46 freedom kernel: [   15.771447] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 13:54:41 freedom kernel: [   15.791940] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 15:37:35 freedom kernel: [   15.749058] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 17:25:04 freedom kernel: [   16.449671] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 19:43:06 freedom kernel: [   16.312367] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 19 21:31:28 freedom kernel: [   15.864131] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 09:12:39 freedom kernel: [   15.764786] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 11:31:36 freedom kernel: [   17.332211] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 13:23:19 freedom kernel: [   15.759616] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 13:45:07 freedom kernel: [   16.557215] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 14:01:17 freedom kernel: [   16.433437] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 14:24:14 freedom kernel: [   15.770057] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 14:47:27 freedom kernel: [   15.725150] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 15:02:31 freedom kernel: [   16.591276] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>> Sep 20 15:19:44 freedom kernel: [   15.863542] amdgpu 0000:04:00.0: 
>> amdgpu: Fetched VBIOS from ROM BAR
>>
>> so not what quite what you were looking for, but when I run cat 
>> kern.log | grep Failed :
>>
>> Sep 15 11:53:43 freedom kernel: [   25.730013] uvcvideo 1-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 15 13:58:04 freedom kernel: [   26.025432] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 15 14:38:23 freedom kernel: [   25.883820] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 16 09:40:52 freedom kernel: [   27.204539] uvcvideo 1-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 16 14:41:22 freedom kernel: [   28.985885] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 16 19:03:50 freedom kernel: [   26.510748] uvcvideo 1-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 17 09:48:50 freedom kernel: [   25.682372] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 17 10:31:23 freedom kernel: [   25.547899] uvcvideo 1-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 18 09:43:12 freedom kernel: [   26.243232] uvcvideo 1-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 19 09:32:07 freedom kernel: [   25.267332] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 19 12:04:46 freedom kernel: [   25.269450] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 19 15:37:35 freedom kernel: [   25.494803] uvcvideo 1-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 19 19:43:06 freedom kernel: [   26.288598] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 20 09:12:39 freedom kernel: [   25.291743] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 20 13:23:19 freedom kernel: [   25.884358] uvcvideo 1-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 20 14:24:14 freedom kernel: [   25.312379] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 20 14:47:27 freedom kernel: [   25.352905] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>> Sep 20 15:19:44 freedom kernel: [   25.297893] uvcvideo 2-5.2:1.1: 
>> Failed to set UVC probe control : -32 (exp. 26).
>>
>>
>> Hopefully this helps.  Please mail me if you for more information.  I 
>> have changed my .config and set
>>
>> # CONFIG_DEBUG_KERNEL_DC is not set
>>
>> I am attempting to build 6.11.0.
>>
>> Thanks,
>>
>> Bob
>>
>>
>> On 2024-09-20 17:34, Alex Hung wrote:
>>>
>>>
>>> On 2024-09-20 17:00, Dr. David Alan Gilbert wrote:
>>>> * Bob Gill (gillb5@...us.net) wrote:
>>>>> Hello.  Kernel 6.11.0 crashes. 6.10.0 builds.  Al Viro and Dr. 
>>>>> David Alan
>>>>> Gilbert have been helpful, and asked that I
>>>>>
>>>>> post a git bisect log.  The last log step seems odd, but the 
>>>>> second last
>>>>> step "Remove useless function call" might be what broke.
>>>>
>>>> Thank you for doing this!
>>>>
>>>> My reading is that's fine, I think the next one:
>>>>
>>>> tree: git bisect bad
>>>> [a171cce57792b0a6206d532050179a381ad74f8f] drm/amd/display: Check 
>>>> and log for
>>>> function error codes
>>>>
>>>> or the one after it is the culprit?
>>>>
>>>> Adding the two Alex's from AMD back onto the thread.
>>>> (Also added the [REGRESSION] marker the notes tell us to add)
>>>
>>> The commit triggers debugger in case of in case of errors.
>>>
>>> Is the config CONFIG_DEBUG_KERNEL_DC (Enable kgdb break in DC) 
>>> enabled in .config, i.e. can you check "grep CONFIG_DEBUG_KERNEL_DC 
>>> .config"?
>>>
>>> If so, can you also try to disable it and check whether you can see 
>>> error messages "Failed to execute VBIOS command table" in kernel log?
>>>
>>>>
>>>>> My hardware is old corei7 quad core/8 thread Tylersberg/Nehalem 
>>>>> with an AMD
>>>>> RX 6500XT.  That's the odd combination.
>>>>>
>>>>> Thanks in advance,
>>>>
>>>> Thanks again for the bisect.
>>>>
>>>> Dave
>>>>
>>>>> Bob
>>>>>
>>>>> Config:  (.config)
>>>>> /data/kernel/bobtest6.10-64
>>>>>
>>>>> Build line: (last command tells me the job is finished)
>>>>> make menuconfig && make -j $(nproc) && make modules && make 
>>>>> modules_install
>>>>> && make install && /data/music/pl.sh
>>>>>
>>>>> Rule 1: Do not modify ANYTHING in the source tree
>>>>>
>>>>> git bisect start
>>>>> git bisect bad
>>>>> git bisect good v6.10
>>>>>
>>>>> Bisecting: 11273 revisions left to test after this (roughly 14 steps)
>>>>> [2c9b3512402ed192d1f43f4531fb5da947e72bd0] Merge tag 'for-linus' of
>>>>> git://git.kernel.org/pub/scm/virt/kvm/kvm
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0+
>>>>>
>>>>> RESULT:  boot 6.10.0+ fails
>>>>>           screen black for more than 2 minutes,
>>>>>           (caps lock key unresponsive, reset, power
>>>>>           buttons on computer case do nothing).  Reset with power 
>>>>> bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 5677 revisions left to test after this (roughly 13 steps)
>>>>> [280e36f0d5b997173d014c07484c03a7f7750668] nsfs: use cleanup guard
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0+
>>>>>
>>>>> RESULT: boot 6.10.0+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 2855 revisions left to test after this (roughly 12 steps)
>>>>> [dde1a0e1625c08cf4f958348a83434b2ddecf449] Merge tag 
>>>>> 'x86-percpu-2024-07-17'
>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0+
>>>>>
>>>>> RESULT: boot 6.10.0+ fails
>>>>>          screen black for more than 2 minutes,
>>>>>          (caps lock key unresponsive, reset, power
>>>>>          buttons on computer case do nothing).  Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 1478 revisions left to test after this (roughly 11 steps)
>>>>> [32a120f52a4c0121bca8f2328d4680d283693d60] drm/i915/mtl: Skip PLL 
>>>>> state
>>>>> verification in TBT mode
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 739 revisions left to test after this (roughly 10 steps)
>>>>> [b6a343df46d69070a7073405e470e6348180ea34] drm/amdgpu: initialize 
>>>>> GC IP
>>>>> v11.5.2
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ fails
>>>>>          screen black for more than 2 minutes,
>>>>>          (caps lock key unresponsive, reset, power
>>>>>          buttons on computer case do nothing).  Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 369 revisions left to test after this (roughly 9 steps)
>>>>> [cf1d06ac53a15b83c0a63225606cfe175e33a8a0] accel/ivpu: Increase 
>>>>> autosuspend
>>>>> delay to 100ms on 40xx
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc1+
>>>>>
>>>>> RESULT: boot 6.10.0-rc1+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 184 revisions left to test after this (roughly 8 steps)
>>>>> [0ca9f757a0e27a076395ec1b2002661bcf5c25e8] drm/amd/pm: powerplay: Add
>>>>> `__counted_by` attribute for flexible arrays
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.9.0-rc5+
>>>>>
>>>>> RESULT: boot 6.9.0-rc5+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 92 revisions left to test after this (roughly 7 steps)
>>>>> [9862ef7bae47b9292a38a0a1b30bff7f56d7815b] drm/amd/display: Use 
>>>>> periodic
>>>>> detection for ipx/headless
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ fails
>>>>>          screen black for more than 2 minutes,
>>>>>          (caps lock key unresponsive, reset, power
>>>>>          buttons on computer case do nothing).  Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 44 revisions left to test after this (roughly 6 steps)
>>>>> [a78313bb206e0c456a989f380c4cbd8af8af7c76] Merge tag
>>>>> 'drm-intel-gt-next-2024-06-12' of
>>>>> https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 22 revisions left to test after this (roughly 5 steps)
>>>>> [51dbe0239b1fc7c435867ce28e5eb4394b6641e1] drm/amd/display: Fix 
>>>>> cursor size
>>>>> issues
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ successful
>>>>> tree: git bisect good
>>>>> Bisecting: 11 revisions left to test after this (roughly 4 steps)
>>>>> [871512e36f9c1c2cb4e62eb860ca0438800e4d63] drm/amd/display: Add 
>>>>> workaround
>>>>> to restrict max frac urgent for DPM0
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ fails
>>>>>          screen black for more than 2 minutes,
>>>>>          (caps lock key unresponsive, reset, power
>>>>>          buttons on computer case do nothing).  Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 5 revisions left to test after this (roughly 3 steps)
>>>>> [5d93060d430b359e16e7c555c8f151ead1ac614b] drm/amd/display: Check 
>>>>> HDCP
>>>>> returned status
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ fails
>>>>>          screen black for more than 2 minutes,
>>>>>          (caps lock key unresponsive, reset, power
>>>>>          buttons on computer case do nothing).  Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> Bisecting: 2 revisions left to test after this (roughly 1 step)
>>>>> [e094992bd1caa1fbd42221c7c305fc3b54172b5c] drm/amd/display: Remove 
>>>>> useless
>>>>> function call
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ successful
>>>>>
>>>>> tree: git bisect good
>>>>> [2c2ee1d1329881d8e6bb23c3b9f3b41df8a8055c] drm/amd/display: Check 
>>>>> and log
>>>>> for function error codes
>>>>>
>>>>> latest kernel:
>>>>> ls -alt /lib/modules | head -2 | tail -1 | tr -s " " | cut -d' ' -f9
>>>>> 6.10.0-rc3+
>>>>>
>>>>> RESULT: boot 6.10.0-rc3+ fails
>>>>>          screen black for more than 2 minutes,
>>>>>          (caps lock key unresponsive, reset, power
>>>>>          buttons on computer case do nothing).  Reset with power bar.
>>>>>
>>>>> tree: git bisect bad
>>>>> [a171cce57792b0a6206d532050179a381ad74f8f] drm/amd/display: Check 
>>>>> and log
>>>>> for function error codes
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ