[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <748505e3-7577-47bd-b502-e6ff357cf9b2@alu.unizg.hr>
Date: Thu, 25 Jan 2024 10:29:21 +0100
From: Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
To: "Ma, Jun" <majun@....com>, linux-kernel@...r.kernel.org,
amd-gfx@...ts.freedesktop.org
Cc: Sathishkumar S <sathishkumar.sundararaju@....com>,
Lijo Lazar <lijo.lazar@....com>,
Srinivasan Shanmugam <srinivasan.shanmugam@....com>,
Guchun Chen <guchun.chen@....com>, Lang Yu <Lang.Yu@....com>,
Felix Kuehling <Felix.Kuehling@....com>, "Pan, Xinhui" <Xinhui.Pan@....com>,
dri-devel@...ts.freedesktop.org, Marek Olšák
<marek.olsak@....com>, Boyuan Zhang <boyuan.zhang@....com>,
Daniel Vetter <daniel@...ll.ch>, David Francis <David.Francis@....com>,
Alex Deucher <alexander.deucher@....com>, David Airlie <airlied@...il.com>,
Christian König <christian.koenig@....com>
Subject: Re: BUG [RESEND][NEW BUG]: kernel NULL pointer dereference, address:
0000000000000008
Hi Ma Jun,
Copy that. This appears to be the exact problem, and thank you for
reviewing the bug report at such a short notice.
I apologise for the wrong assertion.
The patch you sent then just triggered another bug, and it is not
manifested without the patch (but a NULL pointer dereference instead).
But of course, it is not profitable to remove your patch and have
the NULL ptr dereference, but a proper fix is required.
Thanks again.
Best regards,
Mirsad Todorovac
On 1/25/2024 8:38 AM, Ma, Jun wrote:
> Hi Mirsad,
>
>
> On 1/25/2024 1:48 AM, Mirsad Todorovac wrote:
>> Hi, Ma Jun,
>>
>> Normally, I would reply under the quoted text, but I will adjust to your convention.
>>
>> I have just discovered that your patch causes Ubuntu 22.04 LTS GNOME XWayland session
>> to block at typing password and ENTER in the graphical logon screen (tested several times).
>>
> This problem is not caused by my patch.
> Based on your syslog, it looks more like a shedule issue.
> I just saw a similar problem, please refer to the link below
> https://gitlab.freedesktop.org/drm/amd/-/issues/3124
>
> Regards,
> Ma Jun
>> After that, I was not able to even log from another box with ssh, or the session would
>> block (tested one time, second time too, thrid time it passed after I connected before
>> attempt to login on XWayland console).
>>
>> You might find useful syslog and dmesg of the freeze on this link (they were +100K):
>>
>> https://magrf.grf.hr/~mtodorov/linux/bugreports/6.7.0/amdgpu/6.7.0-xway-09721-g61da593f4458/
>>
>> The exact applied patch was this:
>>
>> marvin@...iant:~/linux/kernel/linux_torvalds$ git diff
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index 73f6d7e72c73..6ef333df9adf 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -3996,16 +3996,13 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device *adev)
>>
>> if (!amdgpu_sriov_vf(adev)) {
>> snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
>> - err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, fw_name);
>> - /* don't check this. There are apparently firmwares in the wild with
>> - * incorrect size in the header
>> - */
>> - if (err == -ENODEV)
>> - goto out;
>> + err = request_firmware(&adev->gfx.rlc_fw, fw_name, adev->dev);
>> if (err)
>> - dev_dbg(adev->dev,
>> - "gfx10: amdgpu_ucode_request() failed \"%s\"\n",
>> - fw_name);
>> + goto out;
>> +
>> + /* don't validate this firmware. There are apparently firmwares
>> + * in the wild with incorrect size in the header
>> + */
>> rlc_hdr = (const struct rlc_firmware_header_v2_0 *)adev->gfx.rlc_fw->data;
>> version_major = le16_to_cpu(rlc_hdr->header.header_version_major);
>> version_minor = le16_to_cpu(rlc_hdr->header.header_version_minor);
>> marvin@...iant:~/linux/kernel/linux_torvalds$ uname -rms
>> Linux 6.7.0-xway-09721-g61da593f4458 x86_64
>> marvin@...iant:~/linux/kernel/linux_torvalds$
>>
>> So, there seems to be a problem with the way the patch affects XWayland.
>>
>> Checked multiple times the exact commit with and without the diff.
>>
>> Hope this helps, because I am not familiar with the amdgpu driver.
>>
>> Best regards,
>> Mirsad Todorovac
>>
>> On 1/22/24 09:34, Ma, Jun wrote:
>>> Perhaps similar to the problem I encountered earlier, you can
>>> try the following patch
>>>
>>> https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html
>>>
>>> Regards,
>>> Ma Jun
>>>
>>> On 1/21/2024 3:54 AM, Mirsad Todorovac wrote:
>>>> Hi,
>>>>
>>>> The last email did not pass to the most of the recipients due to banned .xz attachment.
>>>>
>>>> As the .config is too big to send inline or uncompressed either, I will omit it in this
>>>> attempt. In the meantime, I had some success in decoding the stack trace, but sadly not
>>>> complete.
>>>>
>>>> I don't think this Oops is deterministic, but I am working on a reproducer.
>>>>
>>>> The platform is Ubuntu 22.04 LTS.
>>>>
>>>> Complete list of hardware and .config is available here:
>>>>
>>>> https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v02-nokcsan-09928-g052d534373b7/
>>>>
>>>> Best regards,
>>>> Mirsad
>>>>
>>>> -------------------------------------------------------------------------------------------
>>>> kernel: [ 5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>> kernel: [ 5.576707] #PF: supervisor read access in kernel mode
>>>> kernel: [ 5.576710] #PF: error_code(0x0000) - not-present page
>>>> kernel: [ 5.576712] PGD 0 P4D 0
>>>> kernel: [ 5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>> kernel: [ 5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>>> kernel: [ 5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>>> kernel: [ 5.576726] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>>> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>> All code
>>>> ========
>>>> 0: 8d 55 a8 lea -0x58(%rbp),%edx
>>>> 3: 4c 89 ff mov %r15,%rdi
>>>> 6: e8 e4 83 ec ff call 0xffffffffffec83ef
>>>> b: 41 89 c2 mov %eax,%r10d
>>>> e: 83 f8 ed cmp $0xffffffed,%eax
>>>> 11: 0f 84 b3 fd ff ff je 0xfffffffffffffdca
>>>> 17: 85 c0 test %eax,%eax
>>>> 19: 74 05 je 0x20
>>>> 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
>>>> 20: 49 8b 87 08 87 01 00 mov 0x18708(%r15),%rax
>>>> 27: 4c 89 ff mov %r15,%rdi
>>>> 2a:* 48 8b 40 08 mov 0x8(%rax),%rax <-- trapping instruction
>>>> 2e: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>>> 32: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>>> 36: e8 e4 42 fb ff call 0xfffffffffffb431f
>>>> 3b: 41 89 c2 mov %eax,%r10d
>>>> 3e: 85 c0 test %eax,%eax
>>>>
>>>> Code starting with the faulting instruction
>>>> ===========================================
>>>> 0: 48 8b 40 08 mov 0x8(%rax),%rax
>>>> 4: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>>> 8: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>>> c: e8 e4 42 fb ff call 0xfffffffffffb42f5
>>>> 11: 41 89 c2 mov %eax,%r10d
>>>> 14: 85 c0 test %eax,%eax
>>>> kernel: [ 5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>> kernel: [ 5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>> kernel: [ 5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>> kernel: [ 5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>> kernel: [ 5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>> kernel: [ 5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>> kernel: [ 5.576895] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>> kernel: [ 5.576898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> kernel: [ 5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>> kernel: [ 5.576903] PKRU: 55555554
>>>> kernel: [ 5.576905] Call Trace:
>>>> kernel: [ 5.576907] <TASK>
>>>> kernel: [ 5.576909] ? show_regs (arch/x86/kernel/dumpstack.c:479)
>>>> kernel: [ 5.576914] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
>>>> kernel: [ 5.576917] ? page_fault_oops (arch/x86/mm/fault.c:707)
>>>> kernel: [ 5.576921] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.576925] ? crypto_alloc_tfmmem.isra.0 (crypto/api.c:497)
>>>> kernel: [ 5.576930] ? do_user_addr_fault (arch/x86/mm/fault.c:1264)
>>>> kernel: [ 5.576934] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1515 arch/x86/mm/fault.c:1563)
>>>> kernel: [ 5.576937] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
>>>> kernel: [ 5.576942] ? gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>>> kernel: [ 5.577056] amdgpu_device_init (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2465 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4042) amdgpu
>>>> kernel: [ 5.577158] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.577161] ? pci_bus_read_config_word (drivers/pci/access.c:67 (discriminator 2))
>>>> kernel: [ 5.577166] ? pci_read_config_word (drivers/pci/access.c:563)
>>>> kernel: [ 5.577168] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.577171] ? do_pci_enable_device (drivers/pci/pci.c:1975 drivers/pci/pci.c:1949)
>>>> kernel: [ 5.577176] amdgpu_driver_load_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) amdgpu
>>>> kernel: [ 5.577275] amdgpu_pci_probe (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2237) amdgpu
>>>> kernel: [ 5.577373] local_pci_probe (drivers/pci/pci-driver.c:324)
>>>> kernel: [ 5.577377] pci_device_probe (drivers/pci/pci-driver.c:392 drivers/pci/pci-driver.c:417 drivers/pci/pci-driver.c:460)
>>>> kernel: [ 5.577381] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
>>>> kernel: [ 5.577386] __driver_probe_device (drivers/base/dd.c:800)
>>>> kernel: [ 5.577389] driver_probe_device (drivers/base/dd.c:830)
>>>> kernel: [ 5.577392] __driver_attach (drivers/base/dd.c:1217)
>>>> kernel: [ 5.577396] ? __pfx___driver_attach (drivers/base/dd.c:1157)
>>>> kernel: [ 5.577399] bus_for_each_dev (drivers/base/bus.c:368)
>>>> kernel: [ 5.577402] driver_attach (drivers/base/dd.c:1234)
>>>> kernel: [ 5.577405] bus_add_driver (drivers/base/bus.c:674)
>>>> kernel: [ 5.577409] driver_register (drivers/base/driver.c:246)
>>>> kernel: [ 5.577411] ? __pfx_amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2497) amdgpu
>>>> kernel: [ 5.577521] __pci_register_driver (drivers/pci/pci-driver.c:1456)
>>>> kernel: [ 5.577524] amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drvc:2805) amdgpu
>>>> kernel: [ 5.577628] do_one_initcall (init/main.c:1236)
>>>> kernel: [ 5.577632] ? kmalloc_trace (mm/slub.c:3816 mm/slub.c:3860 mm/slub.c:4007)
>>>> kernel: [ 5.577637] do_init_module (kernel/module/main.c:2533)
>>>> kernel: [ 5.577640] load_module (kernel/module/main.c:2984)
>>>> kernel: [ 5.577647] init_module_from_file (kernel/module/main.c:3151)
>>>> kernel: [ 5.577649] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.577652] ? init_module_from_file (kernel/module/main.c:3151)
>>>> kernel: [ 5.577657] idempotent_init_module (kernel/module/main.c:3168)
>>>> kernel: [ 5.577661] __x64_sys_finit_module (./include/linux/file.h:45 kernel/module/main.c:3190 kernel/module/main.c:3172 kernel/module/main.c:3172)
>>>> kernel: [ 5.577664] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
>>>> kernel: [ 5.577668] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.577671] ? ksys_mmap_pgoff (mm/mmap.c:1428)
>>>> kernel: [ 5.577675] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.577678] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.577681] ? syscall_exit_to_user_mode (kernel/entry/commonc:215)
>>>> kernel: [ 5.577684] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.577687] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>>> kernel: [ 5.577689] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.577692] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>>> kernel: [ 5.577695] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.577698] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>>> kernel: [ 5.577700] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>>> kernel: [ 5.577703] ? sysvec_call_function (arch/x86/kernel/smp.c:253 (discriminator 69))
>>>> kernel: [ 5.577707] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
>>>> kernel: [ 5.577709] RIP: 0033:0x7fdaa331e88d
>>>> kernel: [ 5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>>> All code
>>>> ========
>>>> 0: 5b pop %rbx
>>>> 1: 41 5c pop %r12
>>>> 3: c3 ret
>>>> 4: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
>>>> b: 00 00
>>>> d: f3 0f 1e fa endbr64
>>>> 11: 48 89 f8 mov %rdi,%rax
>>>> 14: 48 89 f7 mov %rsi,%rdi
>>>> 17: 48 89 d6 mov %rdx,%rsi
>>>> 1a: 48 89 ca mov %rcx,%rdx
>>>> 1d: 4d 89 c2 mov %r8,%r10
>>>> 20: 4d 89 c8 mov %r9,%r8
>>>> 23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
>>>> 28: 0f 05 syscall
>>>> 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
>>>> 30: 73 01 jae 0x33
>>>> 32: c3 ret
>>>> 33: 48 8b 0d 73 b5 0f 00 mov 0xfb573(%rip),%rcx # 0xfb5ad
>>>> 3a: f7 d8 neg %eax
>>>> 3c: 64 89 01 mov %eax,%fs:(%rcx)
>>>> 3f: 48 rex.W
>>>>
>>>> Code starting with the faulting instruction
>>>> ===========================================
>>>> 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
>>>> 6: 73 01 jae 0x9
>>>> 8: c3 ret
>>>> 9: 48 8b 0d 73 b5 0f 00 mov 0xfb573(%rip),%rcx # 0xfb583
>>>> 10: f7 d8 neg %eax
>>>> 12: 64 89 01 mov %eax,%fs:(%rcx)
>>>> 15: 48 rex.W
>>>> kernel: [ 5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>> kernel: [ 5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>>> kernel: [ 5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>>> kernel: [ 5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>>> kernel: [ 5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>>> kernel: [ 5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>>> kernel: [ 5.577748] </TASK>
>>>> kernel: [ 5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>>> kernel: [ 5.577817] CR2: 0000000000000008
>>>> kernel: [ 5.577820] ---[ end trace 0000000000000000 ]---
>>>> kernel: [ 5.914230] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>>> kernel: [ 5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>> All code
>>>> ========
>>>> 0: 8d 55 a8 lea -0x58(%rbp),%edx
>>>> 3: 4c 89 ff mov %r15,%rdi
>>>> 6: e8 e4 83 ec ff call 0xffffffffffec83ef
>>>> b: 41 89 c2 mov %eax,%r10d
>>>> e: 83 f8 ed cmp $0xffffffed,%eax
>>>> 11: 0f 84 b3 fd ff ff je 0xfffffffffffffdca
>>>> 17: 85 c0 test %eax,%eax
>>>> 19: 74 05 je 0x20
>>>> 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
>>>> 20: 49 8b 87 08 87 01 00 mov 0x18708(%r15),%rax
>>>> 27: 4c 89 ff mov %r15,%rdi
>>>> 2a:* 48 8b 40 08 mov 0x8(%rax),%rax <-- trapping instruction
>>>> 2e: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>>> 32: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>>> 36: e8 e4 42 fb ff call 0xfffffffffffb431f
>>>> 3b: 41 89 c2 mov %eax,%r10d
>>>> 3e: 85 c0 test %eax,%eax
>>>>
>>>> Code starting with the faulting instruction
>>>> ===========================================
>>>> 0: 48 8b 40 08 mov 0x8(%rax),%rax
>>>> 4: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>>> 8: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>>> c: e8 e4 42 fb ff call 0xfffffffffffb42f5
>>>> 11: 41 89 c2 mov %eax,%r10d
>>>> 14: 85 c0 test %eax,%eax
>>>> rsyslogd: rsyslogd's groupid changed to 111
>>>> kernel: [ 5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>> kernel: [ 5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>> kernel: [ 5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>> kernel: [ 5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>> kernel: [ 5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>> kernel: [ 5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>> kernel: [ 5.914410] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>> kernel: [ 5.914414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> kernel: [ 5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>> kernel: [ 5.914419] PKRU: 55555554
>>>>
>>>> Best regards,
>>>> Mirsad
>>>>
>>>> On 1/18/24 18:23, Mirsad Todorovac wrote:
>>>>> Hi,
>>>>>
>>>>> Unfortunately, I was not able to reboot in this kernel again to do the stack decode, but I thought
>>>>> that any information about the NULL pointer dereference is better than no info.
>>>>>
>>>>> The system is Ubuntu 23.10 Mantic with AMD product: Navi 23 [Radeon RX 6600/6600 XT/6600M]
>>>>> graphic card.
>>>>>
>>>>> Please find the config and the hw listing attached.
>>>>>
>>>>> Best regards,
>>>>> Mirsad
>>>>
>>>>
>>>>
>>>>> kernel: [ 5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>>> kernel: [ 5.576707] #PF: supervisor read access in kernel mode
>>>>> kernel: [ 5.576710] #PF: error_code(0x0000) - not-present page
>>>>> kernel: [ 5.576712] PGD 0 P4D 0
>>>>> kernel: [ 5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>>> kernel: [ 5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>>>> kernel: [ 5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>>>> kernel: [ 5.576726] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>>> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>>> kernel: [ 5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>>> kernel: [ 5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>>> kernel: [ 5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>>> kernel: [ 5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>>> kernel: [ 5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>>> kernel: [ 5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>>> kernel: [ 5.576895] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>>> kernel: [ 5.576898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> kernel: [ 5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>>> kernel: [ 5.576903] PKRU: 55555554
>>>>> kernel: [ 5.576905] Call Trace:
>>>>> kernel: [ 5.576907] <TASK>
>>>>> kernel: [ 5.576909] ? show_regs+0x72/0x90
>>>>> kernel: [ 5.576914] ? __die+0x25/0x80
>>>>> kernel: [ 5.576917] ? page_fault_oops+0x154/0x4c0
>>>>> kernel: [ 5.576921] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.576925] ? crypto_alloc_tfmmem.isra0+0x35/0x70
>>>>> kernel: [ 5.576930] ? do_user_addr_fault+0x30e/0x6e0
>>>>> kernel: [ 5.576934] ? exc_page_fault+0x84/0x1b0
>>>>> kernel: [ 5.576937] ? asm_exc_page_fault+0x27/0x30
>>>>> kernel: [ 5.576942] ? gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>>> kernel: [ 5.577056] amdgpu_device_init+0xefa/0x2de0 [amdgpu]
>>>>> kernel: [ 5.577158] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.577161] ? pci_bus_read_config_word+0x47/0x90
>>>>> kernel: [ 5.577166] ? pci_read_config_word+0x27/0x60
>>>>> kernel: [ 5.577168] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.577171] ? do_pci_enable_device+0xe1/0x110
>>>>> kernel: [ 5.577176] amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
>>>>> kernel: [ 5.577275] amdgpu_pci_probe+0x1a8/0x5e0 [amdgpu]
>>>>> kernel: [ 5.577373] local_pci_probe+0x48/0xb0
>>>>> kernel: [ 5.577377] pci_device_probe+0xc8/0x290
>>>>> kernel: [ 5.577381] really_probe+0x1d2/0x440
>>>>> kernel: [ 5.577386] __driver_probe_device+0x8a/0x190
>>>>> kernel: [ 5.577389] driver_probe_device+0x23/0xd0
>>>>> kernel: [ 5.577392] __driver_attach+0x10f/0x220
>>>>> kernel: [ 5.577396] ? __pfx___driver_attach+0x10/0x10
>>>>> kernel: [ 5.577399] bus_for_each_dev+0x7a/0xe0
>>>>> kernel: [ 5.577402] driver_attach+0x1e/0x30
>>>>> kernel: [ 5.577405] bus_add_driver+0x127/0x240
>>>>> kernel: [ 5.577409] driver_register+0x64/0x140
>>>>> kernel: [ 5.577411] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
>>>>> kernel: [ 5.577521] __pci_register_driver+0x68/0x80
>>>>> kernel: [ 5.577524] amdgpu_init+0x69/0xff0 [amdgpu]
>>>>> kernel: [ 5.577628] do_one_initcall+0x46/0x330
>>>>> kernel: [ 5.577632] ? kmalloc_trace+0x136/0x370
>>>>> kernel: [ 5.577637] do_init_module+0x6a/0x280
>>>>> kernel: [ 5.577640] load_module+0x2419/0x2500
>>>>> kernel: [ 5.577647] init_module_from_file+0x9c/0xf0
>>>>> kernel: [ 5.577649] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.577652] ? init_module_from_file+0x9c/0xf0
>>>>> kernel: [ 5.577657] idempotent_init_module+0x184/0x240
>>>>> kernel: [ 5.577661] __x64_sys_finit_module+0x64/0xd0
>>>>> kernel: [ 5.577664] do_syscall_64+0x76/0x140
>>>>> kernel: [ 5.577668] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.577671] ? ksys_mmap_pgoff+0x123/0x270
>>>>> kernel: [ 5.577675] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.577678] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.577681] ? syscall_exit_to_user_mode+0x97/0x1e0
>>>>> kernel: [ 5.577684] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.577687] ? do_syscall_64+0x85/0x140
>>>>> kernel: [ 5.577689] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.577692] ? do_syscall_64+0x85/0x140
>>>>> kernel: [ 5.577695] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.577698] ? do_syscall_64+0x85/0x140
>>>>> kernel: [ 5.577700] ? srso_alias_return_thunk+0x5/0xfbef5
>>>>> kernel: [ 5.577703] ? sysvec_call_function+0x4e/0xb0
>>>>> kernel: [ 5.577707] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>>>> kernel: [ 5.577709] RIP: 0033:0x7fdaa331e88d
>>>>> kernel: [ 5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>>>> kernel: [ 5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>>> kernel: [ 5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>>>> kernel: [ 5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>>>> kernel: [ 5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>>>> kernel: [ 5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>>>> kernel: [ 5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>>>> kernel: [ 5.577748] </TASK>
>>>>> kernel: [ 5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>>>> kernel: [ 5.577817] CR2: 0000000000000008
>>>>> kernel: [ 5.577820] ---[ end trace 0000000000000000 ]---
>>>>> kernel: [ 5.914230] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>>> kernel: [ 5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>>> rsyslogd: rsyslogd's groupid changed to 111
>>>>> kernel: [ 5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>>> kernel: [ 5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>>> kernel: [ 5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>>> kernel: [ 5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>>> kernel: [ 5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>>> kernel: [ 5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>>> kernel: [ 5.914410] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>>> kernel: [ 5.914414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> kernel: [ 5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>>> kernel: [ 5.914419] PKRU: 55555554
>
Powered by blists - more mailing lists