[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0cdd7c33-91f7-4aac-be50-11f63d0bdae9@alu.unizg.hr>
Date: Thu, 25 Jan 2024 19:02:54 +0100
From: Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
To: "Ma, Jun" <majun@....com>, linux-kernel@...r.kernel.org,
amd-gfx@...ts.freedesktop.org
Cc: Sathishkumar S <sathishkumar.sundararaju@....com>,
Lijo Lazar <lijo.lazar@....com>,
Srinivasan Shanmugam <srinivasan.shanmugam@....com>,
Guchun Chen <guchun.chen@....com>, Lang Yu <Lang.Yu@....com>,
Felix Kuehling <Felix.Kuehling@....com>, "Pan, Xinhui" <Xinhui.Pan@....com>,
dri-devel@...ts.freedesktop.org, Marek Olšák
<marek.olsak@....com>, Boyuan Zhang <boyuan.zhang@....com>,
Daniel Vetter <daniel@...ll.ch>, David Francis <David.Francis@....com>,
Alex Deucher <alexander.deucher@....com>, David Airlie <airlied@...il.com>,
Christian König <christian.koenig@....com>
Subject: Re: BUG [RESEND][NEW BUG]: kernel NULL pointer dereference, address:
0000000000000008
Hi Ma Jun,
Greetings again.
So, I just tested the recommended patch and the issue with the graphical login
screen was successfully resolved.
Thank you very much for your prompt reviews and recommended patches.
God bless.
Best regards,
Mirsad Todorovac
On 1/25/24 10:29, Mirsad Todorovac wrote:
> Hi Ma Jun,
>
> Copy that. This appears to be the exact problem, and thank you for
> reviewing the bug report at such a short notice.
>
> I apologise for the wrong assertion.
>
> The patch you sent then just triggered another bug, and it is not manifested without the patch (but a NULL pointer dereference instead).
>
> But of course, it is not profitable to remove your patch and have
> the NULL ptr dereference, but a proper fix is required.
>
> Thanks again.
>
> Best regards,
> Mirsad Todorovac
>
> On 1/25/2024 8:38 AM, Ma, Jun wrote:
>> Hi Mirsad,
>>
>>
>> On 1/25/2024 1:48 AM, Mirsad Todorovac wrote:
>>> Hi, Ma Jun,
>>>
>>> Normally, I would reply under the quoted text, but I will adjust to your convention.
>>>
>>> I have just discovered that your patch causes Ubuntu 22.04 LTS GNOME XWayland session
>>> to block at typing password and ENTER in the graphical logon screen (tested several times).
>>>
>> This problem is not caused by my patch.
>> Based on your syslog, it looks more like a shedule issue.
>> I just saw a similar problem, please refer to the link below
>> https://gitlab.freedesktop.org/drm/amd/-/issues/3124
>>
>> Regards,
>> Ma Jun
>>> After that, I was not able to even log from another box with ssh, or the session would
>>> block (tested one time, second time too, thrid time it passed after I connected before
>>> attempt to login on XWayland console).
>>>
>>> You might find useful syslog and dmesg of the freeze on this link (they were +100K):
>>>
>>> https://magrf.grf.hr/~mtodorov/linux/bugreports/6.7.0/amdgpu/6.7.0-xway-09721-g61da593f4458/
>>>
>>> The exact applied patch was this:
>>>
>>> marvin@...iant:~/linux/kernel/linux_torvalds$ git diff
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> index 73f6d7e72c73..6ef333df9adf 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> @@ -3996,16 +3996,13 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device *adev)
>>> if (!amdgpu_sriov_vf(adev)) {
>>> snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
>>> - err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, fw_name);
>>> - /* don't check this. There are apparently firmwares in the wild with
>>> - * incorrect size in the header
>>> - */
>>> - if (err == -ENODEV)
>>> - goto out;
>>> + err = request_firmware(&adev->gfx.rlc_fw, fw_name, adev->dev);
>>> if (err)
>>> - dev_dbg(adev->dev,
>>> - "gfx10: amdgpu_ucode_request() failed \"%s\"\n",
>>> - fw_name);
>>> + goto out;
>>> +
>>> + /* don't validate this firmware. There are apparently firmwares
>>> + * in the wild with incorrect size in the header
>>> + */
>>> rlc_hdr = (const struct rlc_firmware_header_v2_0 *)adev->gfx.rlc_fw->data;
>>> version_major = le16_to_cpu(rlc_hdr->header.header_version_major);
>>> version_minor = le16_to_cpu(rlc_hdr->header.header_version_minor);
>>> marvin@...iant:~/linux/kernel/linux_torvalds$ uname -rms
>>> Linux 6.7.0-xway-09721-g61da593f4458 x86_64
>>> marvin@...iant:~/linux/kernel/linux_torvalds$
>>>
>>> So, there seems to be a problem with the way the patch affects XWayland.
>>>
>>> Checked multiple times the exact commit with and without the diff.
>>>
>>> Hope this helps, because I am not familiar with the amdgpu driver.
>>>
>>> Best regards,
>>> Mirsad Todorovac
>>>
>>> On 1/22/24 09:34, Ma, Jun wrote:
>>>> Perhaps similar to the problem I encountered earlier, you can
>>>> try the following patch
>>>>
>>>> https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html
>>>>
>>>> Regards,
>>>> Ma Jun
>>>>
>>>> On 1/21/2024 3:54 AM, Mirsad Todorovac wrote:
>>>>> Hi,
>>>>>
>>>>> The last email did not pass to the most of the recipients due to banned .xz attachment.
>>>>>
>>>>> As the .config is too big to send inline or uncompressed either, I will omit it in this
>>>>> attempt. In the meantime, I had some success in decoding the stack trace, but sadly not
>>>>> complete.
>>>>>
>>>>> I don't think this Oops is deterministic, but I am working on a reproducer.
>>>>>
>>>>> The platform is Ubuntu 22.04 LTS.
>>>>>
>>>>> Complete list of hardware and .config is available here:
>>>>>
>>>>> https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v02-nokcsan-09928-g052d534373b7/
>>>>>
>>>>> Best regards,
>>>>> Mirsad
>>>>>
>>>>> -------------------------------------------------------------------------------------------
>>>>> kernel: [ 5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>>> kernel: [ 5.576707] #PF: supervisor read access in kernel mode
>>>>> kernel: [ 5.576710] #PF: error_code(0x0000) - not-present page
>>>>> kernel: [ 5.576712] PGD 0 P4D 0
>>>>> kernel: [ 5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>>> kernel: [ 5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>>>> kernel: [ 5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>>>> kernel: [ 5.576726] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>>>> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>>> All code
>>>>> ========
>>>>> 0: 8d 55 a8 lea -0x58(%rbp),%edx
>>>>> 3: 4c 89 ff mov %r15,%rdi
>>>>> 6: e8 e4 83 ec ff call 0xffffffffffec83ef
>>>>> b: 41 89 c2 mov %eax,%r10d
>>>>> e: 83 f8 ed cmp $0xffffffed,%eax
>>>>> 11: 0f 84 b3 fd ff ff je 0xfffffffffffffdca
>>>>> 17: 85 c0 test %eax,%eax
>>>>> 19: 74 05 je 0x20
>>>>> 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
>>>>> 20: 49 8b 87 08 87 01 00 mov 0x18708(%r15),%rax
>>>>> 27: 4c 89 ff mov %r15,%rdi
>>>>> 2a:* 48 8b 40 08 mov 0x8(%rax),%rax <-- trapping instruction
>>>>> 2e: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>>>> 32: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>>>> 36: e8 e4 42 fb ff call 0xfffffffffffb431f
>>>>> 3b: 41 89 c2 mov %eax,%r10d
>>>>> 3e: 85 c0 test %eax,%eax
>>>>>
>>>>> Code starting with the faulting instruction
>>>>> ===========================================
>>>>> 0: 48 8b 40 08 mov 0x8(%rax),%rax
>>>>> 4: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>>>> 8: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>>>> c: e8 e4 42 fb ff call 0xfffffffffffb42f5
>>>>> 11: 41 89 c2 mov %eax,%r10d
>>>>> 14: 85 c0 test %eax,%eax
>>>>> kernel: [ 5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>>> kernel: [ 5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>>> kernel: [ 5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>>> kernel: [ 5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>>> kernel: [ 5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>>> kernel: [ 5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>>> kernel: [ 5.576895] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>>> kernel: [ 5.576898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> kernel: [ 5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>>> kernel: [ 5.576903] PKRU: 55555554
>>>>> kernel: [ 5.576905] Call Trace:
>>>>> kernel: [ 5.576907] <TASK>
>>>>> kernel: [ 5.576909] ? show_regs (arch/x86/kernel/dumpstack.c:479)
>>>>> kernel: [ 5.576914] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
>>>>> kernel: [ 5.576917] ? page_fault_oops (arch/x86/mm/fault.c:707)
>>>>> kernel: [ 5.576921] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.576925] ? crypto_alloc_tfmmem.isra.0 (crypto/api.c:497)
>>>>> kernel: [ 5.576930] ? do_user_addr_fault (arch/x86/mm/fault.c:1264)
>>>>> kernel: [ 5.576934] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1515 arch/x86/mm/fault.c:1563)
>>>>> kernel: [ 5.576937] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
>>>>> kernel: [ 5.576942] ? gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>>>> kernel: [ 5.577056] amdgpu_device_init (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2465 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4042) amdgpu
>>>>> kernel: [ 5.577158] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.577161] ? pci_bus_read_config_word (drivers/pci/access.c:67 (discriminator 2))
>>>>> kernel: [ 5.577166] ? pci_read_config_word (drivers/pci/access.c:563)
>>>>> kernel: [ 5.577168] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.577171] ? do_pci_enable_device (drivers/pci/pci.c:1975 drivers/pci/pci.c:1949)
>>>>> kernel: [ 5.577176] amdgpu_driver_load_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) amdgpu
>>>>> kernel: [ 5.577275] amdgpu_pci_probe (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2237) amdgpu
>>>>> kernel: [ 5.577373] local_pci_probe (drivers/pci/pci-driver.c:324)
>>>>> kernel: [ 5.577377] pci_device_probe (drivers/pci/pci-driver.c:392 drivers/pci/pci-driver.c:417 drivers/pci/pci-driver.c:460)
>>>>> kernel: [ 5.577381] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
>>>>> kernel: [ 5.577386] __driver_probe_device (drivers/base/dd.c:800)
>>>>> kernel: [ 5.577389] driver_probe_device (drivers/base/dd.c:830)
>>>>> kernel: [ 5.577392] __driver_attach (drivers/base/dd.c:1217)
>>>>> kernel: [ 5.577396] ? __pfx___driver_attach (drivers/base/dd.c:1157)
>>>>> kernel: [ 5.577399] bus_for_each_dev (drivers/base/bus.c:368)
>>>>> kernel: [ 5.577402] driver_attach (drivers/base/dd.c:1234)
>>>>> kernel: [ 5.577405] bus_add_driver (drivers/base/bus.c:674)
>>>>> kernel: [ 5.577409] driver_register (drivers/base/driver.c:246)
>>>>> kernel: [ 5.577411] ? __pfx_amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2497) amdgpu
>>>>> kernel: [ 5.577521] __pci_register_driver (drivers/pci/pci-driver.c:1456)
>>>>> kernel: [ 5.577524] amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drvc:2805) amdgpu
>>>>> kernel: [ 5.577628] do_one_initcall (init/main.c:1236)
>>>>> kernel: [ 5.577632] ? kmalloc_trace (mm/slub.c:3816 mm/slub.c:3860 mm/slub.c:4007)
>>>>> kernel: [ 5.577637] do_init_module (kernel/module/main.c:2533)
>>>>> kernel: [ 5.577640] load_module (kernel/module/main.c:2984)
>>>>> kernel: [ 5.577647] init_module_from_file (kernel/module/main.c:3151)
>>>>> kernel: [ 5.577649] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.577652] ? init_module_from_file (kernel/module/main.c:3151)
>>>>> kernel: [ 5.577657] idempotent_init_module (kernel/module/main.c:3168)
>>>>> kernel: [ 5.577661] __x64_sys_finit_module (./include/linux/file.h:45 kernel/module/main.c:3190 kernel/module/main.c:3172 kernel/module/main.c:3172)
>>>>> kernel: [ 5.577664] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
>>>>> kernel: [ 5.577668] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.577671] ? ksys_mmap_pgoff (mm/mmap.c:1428)
>>>>> kernel: [ 5.577675] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.577678] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.577681] ? syscall_exit_to_user_mode (kernel/entry/commonc:215)
>>>>> kernel: [ 5.577684] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.577687] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>>>> kernel: [ 5.577689] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.577692] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>>>> kernel: [ 5.577695] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.577698] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>>>> kernel: [ 5.577700] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>>> kernel: [ 5.577703] ? sysvec_call_function (arch/x86/kernel/smp.c:253 (discriminator 69))
>>>>> kernel: [ 5.577707] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
>>>>> kernel: [ 5.577709] RIP: 0033:0x7fdaa331e88d
>>>>> kernel: [ 5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>>>> All code
>>>>> ========
>>>>> 0: 5b pop %rbx
>>>>> 1: 41 5c pop %r12
>>>>> 3: c3 ret
>>>>> 4: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
>>>>> b: 00 00
>>>>> d: f3 0f 1e fa endbr64
>>>>> 11: 48 89 f8 mov %rdi,%rax
>>>>> 14: 48 89 f7 mov %rsi,%rdi
>>>>> 17: 48 89 d6 mov %rdx,%rsi
>>>>> 1a: 48 89 ca mov %rcx,%rdx
>>>>> 1d: 4d 89 c2 mov %r8,%r10
>>>>> 20: 4d 89 c8 mov %r9,%r8
>>>>> 23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
>>>>> 28: 0f 05 syscall
>>>>> 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
>>>>> 30: 73 01 jae 0x33
>>>>> 32: c3 ret
>>>>> 33: 48 8b 0d 73 b5 0f 00 mov 0xfb573(%rip),%rcx # 0xfb5ad
>>>>> 3a: f7 d8 neg %eax
>>>>> 3c: 64 89 01 mov %eax,%fs:(%rcx)
>>>>> 3f: 48 rex.W
>>>>>
>>>>> Code starting with the faulting instruction
>>>>> ===========================================
>>>>> 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
>>>>> 6: 73 01 jae 0x9
>>>>> 8: c3 ret
>>>>> 9: 48 8b 0d 73 b5 0f 00 mov 0xfb573(%rip),%rcx # 0xfb583
>>>>> 10: f7 d8 neg %eax
>>>>> 12: 64 89 01 mov %eax,%fs:(%rcx)
>>>>> 15: 48 rex.W
>>>>> kernel: [ 5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>>> kernel: [ 5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>>>> kernel: [ 5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>>>> kernel: [ 5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>>>> kernel: [ 5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>>>> kernel: [ 5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>>>> kernel: [ 5.577748] </TASK>
>>>>> kernel: [ 5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>>>> kernel: [ 5.577817] CR2: 0000000000000008
>>>>> kernel: [ 5.577820] ---[ end trace 0000000000000000 ]---
>>>>> kernel: [ 5.914230] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>>>> kernel: [ 5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>>> All code
>>>>> ========
>>>>> 0: 8d 55 a8 lea -0x58(%rbp),%edx
>>>>> 3: 4c 89 ff mov %r15,%rdi
>>>>> 6: e8 e4 83 ec ff call 0xffffffffffec83ef
>>>>> b: 41 89 c2 mov %eax,%r10d
>>>>> e: 83 f8 ed cmp $0xffffffed,%eax
>>>>> 11: 0f 84 b3 fd ff ff je 0xfffffffffffffdca
>>>>> 17: 85 c0 test %eax,%eax
>>>>> 19: 74 05 je 0x20
>>>>> 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
>>>>> 20: 49 8b 87 08 87 01 00 mov 0x18708(%r15),%rax
>>>>> 27: 4c 89 ff mov %r15,%rdi
>>>>> 2a:* 48 8b 40 08 mov 0x8(%rax),%rax <-- trapping instruction
>>>>> 2e: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>>>> 32: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>>>> 36: e8 e4 42 fb ff call 0xfffffffffffb431f
>>>>> 3b: 41 89 c2 mov %eax,%r10d
>>>>> 3e: 85 c0 test %eax,%eax
>>>>>
>>>>> Code starting with the faulting instruction
>>>>> ===========================================
>>>>> 0: 48 8b 40 08 mov 0x8(%rax),%rax
>>>>> 4: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>>>> 8: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>>>> c: e8 e4 42 fb ff call 0xfffffffffffb42f5
>>>>> 11: 41 89 c2 mov %eax,%r10d
>>>>> 14: 85 c0 test %eax,%eax
>>>>> rsyslogd: rsyslogd's groupid changed to 111
>>>>> kernel: [ 5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>>> kernel: [ 5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>>> kernel: [ 5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>>> kernel: [ 5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>>> kernel: [ 5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>>> kernel: [ 5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>>> kernel: [ 5.914410] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>>> kernel: [ 5.914414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> kernel: [ 5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>>> kernel: [ 5.914419] PKRU: 55555554
>>>>>
>>>>> Best regards,
>>>>> Mirsad
>>>>>
>>>>> On 1/18/24 18:23, Mirsad Todorovac wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Unfortunately, I was not able to reboot in this kernel again to do the stack decode, but I thought
>>>>>> that any information about the NULL pointer dereference is better than no info.
>>>>>>
>>>>>> The system is Ubuntu 23.10 Mantic with AMD product: Navi 23 [Radeon RX 6600/6600 XT/6600M]
>>>>>> graphic card.
>>>>>>
>>>>>> Please find the config and the hw listing attached.
>>>>>>
>>>>>> Best regards,
>>>>>> Mirsad
>>>>>
>>>>>
>>>>>
>>>>>> kernel: [ 5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>>>> kernel: [ 5.576707] #PF: supervisor read access in kernel mode
>>>>>> kernel: [ 5.576710] #PF: error_code(0x0000) - not-present page
>>>>>> kernel: [ 5.576712] PGD 0 P4D 0
>>>>>> kernel: [ 5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>>>> kernel: [ 5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>>>>> kernel: [ 5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>>>>> kernel: [ 5.576726] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>>>> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>>>> kernel: [ 5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>>>> kernel: [ 5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>>>> kernel: [ 5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>>>> kernel: [ 5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>>>> kernel: [ 5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>>>> kernel: [ 5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>>>> kernel: [ 5.576895] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>>>> kernel: [ 5.576898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> kernel: [ 5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>>>> kernel: [ 5.576903] PKRU: 55555554
>>>>>> kernel: [ 5.576905] Call Trace:
>>>>>> kernel: [ 5.576907] <TASK>
>>>>>> kernel: [ 5.576909] ? show_regs+0x72/0x90
>>>>>> kernel: [ 5.576914] ? __die+0x25/0x80
>>>>>> kernel: [ 5.576917] ? page_fault_oops+0x154/0x4c0
>>>>>> kernel: [ 5.576921] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.576925] ? crypto_alloc_tfmmem.isra0+0x35/0x70
>>>>>> kernel: [ 5.576930] ? do_user_addr_fault+0x30e/0x6e0
>>>>>> kernel: [ 5.576934] ? exc_page_fault+0x84/0x1b0
>>>>>> kernel: [ 5.576937] ? asm_exc_page_fault+0x27/0x30
>>>>>> kernel: [ 5.576942] ? gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>>>> kernel: [ 5.577056] amdgpu_device_init+0xefa/0x2de0 [amdgpu]
>>>>>> kernel: [ 5.577158] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.577161] ? pci_bus_read_config_word+0x47/0x90
>>>>>> kernel: [ 5.577166] ? pci_read_config_word+0x27/0x60
>>>>>> kernel: [ 5.577168] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.577171] ? do_pci_enable_device+0xe1/0x110
>>>>>> kernel: [ 5.577176] amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
>>>>>> kernel: [ 5.577275] amdgpu_pci_probe+0x1a8/0x5e0 [amdgpu]
>>>>>> kernel: [ 5.577373] local_pci_probe+0x48/0xb0
>>>>>> kernel: [ 5.577377] pci_device_probe+0xc8/0x290
>>>>>> kernel: [ 5.577381] really_probe+0x1d2/0x440
>>>>>> kernel: [ 5.577386] __driver_probe_device+0x8a/0x190
>>>>>> kernel: [ 5.577389] driver_probe_device+0x23/0xd0
>>>>>> kernel: [ 5.577392] __driver_attach+0x10f/0x220
>>>>>> kernel: [ 5.577396] ? __pfx___driver_attach+0x10/0x10
>>>>>> kernel: [ 5.577399] bus_for_each_dev+0x7a/0xe0
>>>>>> kernel: [ 5.577402] driver_attach+0x1e/0x30
>>>>>> kernel: [ 5.577405] bus_add_driver+0x127/0x240
>>>>>> kernel: [ 5.577409] driver_register+0x64/0x140
>>>>>> kernel: [ 5.577411] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
>>>>>> kernel: [ 5.577521] __pci_register_driver+0x68/0x80
>>>>>> kernel: [ 5.577524] amdgpu_init+0x69/0xff0 [amdgpu]
>>>>>> kernel: [ 5.577628] do_one_initcall+0x46/0x330
>>>>>> kernel: [ 5.577632] ? kmalloc_trace+0x136/0x370
>>>>>> kernel: [ 5.577637] do_init_module+0x6a/0x280
>>>>>> kernel: [ 5.577640] load_module+0x2419/0x2500
>>>>>> kernel: [ 5.577647] init_module_from_file+0x9c/0xf0
>>>>>> kernel: [ 5.577649] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.577652] ? init_module_from_file+0x9c/0xf0
>>>>>> kernel: [ 5.577657] idempotent_init_module+0x184/0x240
>>>>>> kernel: [ 5.577661] __x64_sys_finit_module+0x64/0xd0
>>>>>> kernel: [ 5.577664] do_syscall_64+0x76/0x140
>>>>>> kernel: [ 5.577668] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.577671] ? ksys_mmap_pgoff+0x123/0x270
>>>>>> kernel: [ 5.577675] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.577678] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.577681] ? syscall_exit_to_user_mode+0x97/0x1e0
>>>>>> kernel: [ 5.577684] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.577687] ? do_syscall_64+0x85/0x140
>>>>>> kernel: [ 5.577689] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.577692] ? do_syscall_64+0x85/0x140
>>>>>> kernel: [ 5.577695] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.577698] ? do_syscall_64+0x85/0x140
>>>>>> kernel: [ 5.577700] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>>> kernel: [ 5.577703] ? sysvec_call_function+0x4e/0xb0
>>>>>> kernel: [ 5.577707] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>>>>> kernel: [ 5.577709] RIP: 0033:0x7fdaa331e88d
>>>>>> kernel: [ 5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>>>>> kernel: [ 5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>>>> kernel: [ 5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>>>>> kernel: [ 5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>>>>> kernel: [ 5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>>>>> kernel: [ 5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>>>>> kernel: [ 5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>>>>> kernel: [ 5.577748] </TASK>
>>>>>> kernel: [ 5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi
>>>>>> gpio_amdpt
>>>>>> kernel: [ 5.577817] CR2: 0000000000000008
>>>>>> kernel: [ 5.577820] ---[ end trace 0000000000000000 ]---
>>>>>> kernel: [ 5.914230] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>>>> kernel: [ 5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>>>> rsyslogd: rsyslogd's groupid changed to 111
>>>>>> kernel: [ 5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>>>> kernel: [ 5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>>>> kernel: [ 5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>>>> kernel: [ 5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>>>> kernel: [ 5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>>>> kernel: [ 5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>>>> kernel: [ 5.914410] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>>>> kernel: [ 5.914414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> kernel: [ 5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>>>> kernel: [ 5.914419] PKRU: 55555554
>>
Powered by blists - more mailing lists