[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ea7a6f91-c3fc-456f-9289-58dfbae6e091@amd.com>
Date: Thu, 25 Jan 2024 15:38:10 +0800
From: "Ma, Jun" <majun@....com>
To: Mirsad Todorovac <mirsad.todorovac@....unizg.hr>,
linux-kernel@...r.kernel.org, amd-gfx@...ts.freedesktop.org
Cc: majun@....com, Sathishkumar S <sathishkumar.sundararaju@....com>,
Lijo Lazar <lijo.lazar@....com>,
Srinivasan Shanmugam <srinivasan.shanmugam@....com>,
Guchun Chen <guchun.chen@....com>, Lang Yu <Lang.Yu@....com>,
Felix Kuehling <Felix.Kuehling@....com>, "Pan, Xinhui" <Xinhui.Pan@....com>,
dri-devel@...ts.freedesktop.org, Marek Olšák
<marek.olsak@....com>, Boyuan Zhang <boyuan.zhang@....com>,
Daniel Vetter <daniel@...ll.ch>, David Francis <David.Francis@....com>,
Alex Deucher <alexander.deucher@....com>, David Airlie <airlied@...il.com>,
Christian König <christian.koenig@....com>
Subject: Re: BUG [RESEND][NEW BUG]: kernel NULL pointer dereference, address:
0000000000000008
Hi Mirsad,
On 1/25/2024 1:48 AM, Mirsad Todorovac wrote:
> Hi, Ma Jun,
>
> Normally, I would reply under the quoted text, but I will adjust to your convention.
>
> I have just discovered that your patch causes Ubuntu 22.04 LTS GNOME XWayland session
> to block at typing password and ENTER in the graphical logon screen (tested several times).
>
This problem is not caused by my patch.
Based on your syslog, it looks more like a shedule issue.
I just saw a similar problem, please refer to the link below
https://gitlab.freedesktop.org/drm/amd/-/issues/3124
Regards,
Ma Jun
> After that, I was not able to even log from another box with ssh, or the session would
> block (tested one time, second time too, thrid time it passed after I connected before
> attempt to login on XWayland console).
>
> You might find useful syslog and dmesg of the freeze on this link (they were +100K):
>
> https://magrf.grf.hr/~mtodorov/linux/bugreports/6.7.0/amdgpu/6.7.0-xway-09721-g61da593f4458/
>
> The exact applied patch was this:
>
> marvin@...iant:~/linux/kernel/linux_torvalds$ git diff
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 73f6d7e72c73..6ef333df9adf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -3996,16 +3996,13 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device *adev)
>
> if (!amdgpu_sriov_vf(adev)) {
> snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
> - err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, fw_name);
> - /* don't check this. There are apparently firmwares in the wild with
> - * incorrect size in the header
> - */
> - if (err == -ENODEV)
> - goto out;
> + err = request_firmware(&adev->gfx.rlc_fw, fw_name, adev->dev);
> if (err)
> - dev_dbg(adev->dev,
> - "gfx10: amdgpu_ucode_request() failed \"%s\"\n",
> - fw_name);
> + goto out;
> +
> + /* don't validate this firmware. There are apparently firmwares
> + * in the wild with incorrect size in the header
> + */
> rlc_hdr = (const struct rlc_firmware_header_v2_0 *)adev->gfx.rlc_fw->data;
> version_major = le16_to_cpu(rlc_hdr->header.header_version_major);
> version_minor = le16_to_cpu(rlc_hdr->header.header_version_minor);
> marvin@...iant:~/linux/kernel/linux_torvalds$ uname -rms
> Linux 6.7.0-xway-09721-g61da593f4458 x86_64
> marvin@...iant:~/linux/kernel/linux_torvalds$
>
> So, there seems to be a problem with the way the patch affects XWayland.
>
> Checked multiple times the exact commit with and without the diff.
>
> Hope this helps, because I am not familiar with the amdgpu driver.
>
> Best regards,
> Mirsad Todorovac
>
> On 1/22/24 09:34, Ma, Jun wrote:
>> Perhaps similar to the problem I encountered earlier, you can
>> try the following patch
>>
>> https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html
>>
>> Regards,
>> Ma Jun
>>
>> On 1/21/2024 3:54 AM, Mirsad Todorovac wrote:
>>> Hi,
>>>
>>> The last email did not pass to the most of the recipients due to banned .xz attachment.
>>>
>>> As the .config is too big to send inline or uncompressed either, I will omit it in this
>>> attempt. In the meantime, I had some success in decoding the stack trace, but sadly not
>>> complete.
>>>
>>> I don't think this Oops is deterministic, but I am working on a reproducer.
>>>
>>> The platform is Ubuntu 22.04 LTS.
>>>
>>> Complete list of hardware and .config is available here:
>>>
>>> https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v02-nokcsan-09928-g052d534373b7/
>>>
>>> Best regards,
>>> Mirsad
>>>
>>> -------------------------------------------------------------------------------------------
>>> kernel: [ 5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>> kernel: [ 5.576707] #PF: supervisor read access in kernel mode
>>> kernel: [ 5.576710] #PF: error_code(0x0000) - not-present page
>>> kernel: [ 5.576712] PGD 0 P4D 0
>>> kernel: [ 5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>> kernel: [ 5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>> kernel: [ 5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>> kernel: [ 5.576726] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>> All code
>>> ========
>>> 0: 8d 55 a8 lea -0x58(%rbp),%edx
>>> 3: 4c 89 ff mov %r15,%rdi
>>> 6: e8 e4 83 ec ff call 0xffffffffffec83ef
>>> b: 41 89 c2 mov %eax,%r10d
>>> e: 83 f8 ed cmp $0xffffffed,%eax
>>> 11: 0f 84 b3 fd ff ff je 0xfffffffffffffdca
>>> 17: 85 c0 test %eax,%eax
>>> 19: 74 05 je 0x20
>>> 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
>>> 20: 49 8b 87 08 87 01 00 mov 0x18708(%r15),%rax
>>> 27: 4c 89 ff mov %r15,%rdi
>>> 2a:* 48 8b 40 08 mov 0x8(%rax),%rax <-- trapping instruction
>>> 2e: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>> 32: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>> 36: e8 e4 42 fb ff call 0xfffffffffffb431f
>>> 3b: 41 89 c2 mov %eax,%r10d
>>> 3e: 85 c0 test %eax,%eax
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>> 0: 48 8b 40 08 mov 0x8(%rax),%rax
>>> 4: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>> 8: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>> c: e8 e4 42 fb ff call 0xfffffffffffb42f5
>>> 11: 41 89 c2 mov %eax,%r10d
>>> 14: 85 c0 test %eax,%eax
>>> kernel: [ 5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>> kernel: [ 5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>> kernel: [ 5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>> kernel: [ 5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>> kernel: [ 5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>> kernel: [ 5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>> kernel: [ 5.576895] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>> kernel: [ 5.576898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: [ 5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>> kernel: [ 5.576903] PKRU: 55555554
>>> kernel: [ 5.576905] Call Trace:
>>> kernel: [ 5.576907] <TASK>
>>> kernel: [ 5.576909] ? show_regs (arch/x86/kernel/dumpstack.c:479)
>>> kernel: [ 5.576914] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
>>> kernel: [ 5.576917] ? page_fault_oops (arch/x86/mm/fault.c:707)
>>> kernel: [ 5.576921] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.576925] ? crypto_alloc_tfmmem.isra.0 (crypto/api.c:497)
>>> kernel: [ 5.576930] ? do_user_addr_fault (arch/x86/mm/fault.c:1264)
>>> kernel: [ 5.576934] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1515 arch/x86/mm/fault.c:1563)
>>> kernel: [ 5.576937] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
>>> kernel: [ 5.576942] ? gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>> kernel: [ 5.577056] amdgpu_device_init (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2465 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4042) amdgpu
>>> kernel: [ 5.577158] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.577161] ? pci_bus_read_config_word (drivers/pci/access.c:67 (discriminator 2))
>>> kernel: [ 5.577166] ? pci_read_config_word (drivers/pci/access.c:563)
>>> kernel: [ 5.577168] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.577171] ? do_pci_enable_device (drivers/pci/pci.c:1975 drivers/pci/pci.c:1949)
>>> kernel: [ 5.577176] amdgpu_driver_load_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) amdgpu
>>> kernel: [ 5.577275] amdgpu_pci_probe (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2237) amdgpu
>>> kernel: [ 5.577373] local_pci_probe (drivers/pci/pci-driver.c:324)
>>> kernel: [ 5.577377] pci_device_probe (drivers/pci/pci-driver.c:392 drivers/pci/pci-driver.c:417 drivers/pci/pci-driver.c:460)
>>> kernel: [ 5.577381] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
>>> kernel: [ 5.577386] __driver_probe_device (drivers/base/dd.c:800)
>>> kernel: [ 5.577389] driver_probe_device (drivers/base/dd.c:830)
>>> kernel: [ 5.577392] __driver_attach (drivers/base/dd.c:1217)
>>> kernel: [ 5.577396] ? __pfx___driver_attach (drivers/base/dd.c:1157)
>>> kernel: [ 5.577399] bus_for_each_dev (drivers/base/bus.c:368)
>>> kernel: [ 5.577402] driver_attach (drivers/base/dd.c:1234)
>>> kernel: [ 5.577405] bus_add_driver (drivers/base/bus.c:674)
>>> kernel: [ 5.577409] driver_register (drivers/base/driver.c:246)
>>> kernel: [ 5.577411] ? __pfx_amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2497) amdgpu
>>> kernel: [ 5.577521] __pci_register_driver (drivers/pci/pci-driver.c:1456)
>>> kernel: [ 5.577524] amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drvc:2805) amdgpu
>>> kernel: [ 5.577628] do_one_initcall (init/main.c:1236)
>>> kernel: [ 5.577632] ? kmalloc_trace (mm/slub.c:3816 mm/slub.c:3860 mm/slub.c:4007)
>>> kernel: [ 5.577637] do_init_module (kernel/module/main.c:2533)
>>> kernel: [ 5.577640] load_module (kernel/module/main.c:2984)
>>> kernel: [ 5.577647] init_module_from_file (kernel/module/main.c:3151)
>>> kernel: [ 5.577649] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.577652] ? init_module_from_file (kernel/module/main.c:3151)
>>> kernel: [ 5.577657] idempotent_init_module (kernel/module/main.c:3168)
>>> kernel: [ 5.577661] __x64_sys_finit_module (./include/linux/file.h:45 kernel/module/main.c:3190 kernel/module/main.c:3172 kernel/module/main.c:3172)
>>> kernel: [ 5.577664] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
>>> kernel: [ 5.577668] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.577671] ? ksys_mmap_pgoff (mm/mmap.c:1428)
>>> kernel: [ 5.577675] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.577678] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.577681] ? syscall_exit_to_user_mode (kernel/entry/commonc:215)
>>> kernel: [ 5.577684] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.577687] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>> kernel: [ 5.577689] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.577692] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>> kernel: [ 5.577695] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.577698] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>>> kernel: [ 5.577700] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>>> kernel: [ 5.577703] ? sysvec_call_function (arch/x86/kernel/smp.c:253 (discriminator 69))
>>> kernel: [ 5.577707] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
>>> kernel: [ 5.577709] RIP: 0033:0x7fdaa331e88d
>>> kernel: [ 5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>> All code
>>> ========
>>> 0: 5b pop %rbx
>>> 1: 41 5c pop %r12
>>> 3: c3 ret
>>> 4: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
>>> b: 00 00
>>> d: f3 0f 1e fa endbr64
>>> 11: 48 89 f8 mov %rdi,%rax
>>> 14: 48 89 f7 mov %rsi,%rdi
>>> 17: 48 89 d6 mov %rdx,%rsi
>>> 1a: 48 89 ca mov %rcx,%rdx
>>> 1d: 4d 89 c2 mov %r8,%r10
>>> 20: 4d 89 c8 mov %r9,%r8
>>> 23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
>>> 28: 0f 05 syscall
>>> 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
>>> 30: 73 01 jae 0x33
>>> 32: c3 ret
>>> 33: 48 8b 0d 73 b5 0f 00 mov 0xfb573(%rip),%rcx # 0xfb5ad
>>> 3a: f7 d8 neg %eax
>>> 3c: 64 89 01 mov %eax,%fs:(%rcx)
>>> 3f: 48 rex.W
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>> 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
>>> 6: 73 01 jae 0x9
>>> 8: c3 ret
>>> 9: 48 8b 0d 73 b5 0f 00 mov 0xfb573(%rip),%rcx # 0xfb583
>>> 10: f7 d8 neg %eax
>>> 12: 64 89 01 mov %eax,%fs:(%rcx)
>>> 15: 48 rex.W
>>> kernel: [ 5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>> kernel: [ 5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>> kernel: [ 5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>> kernel: [ 5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>> kernel: [ 5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>> kernel: [ 5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>> kernel: [ 5.577748] </TASK>
>>> kernel: [ 5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>> kernel: [ 5.577817] CR2: 0000000000000008
>>> kernel: [ 5.577820] ---[ end trace 0000000000000000 ]---
>>> kernel: [ 5.914230] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>>> kernel: [ 5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>> All code
>>> ========
>>> 0: 8d 55 a8 lea -0x58(%rbp),%edx
>>> 3: 4c 89 ff mov %r15,%rdi
>>> 6: e8 e4 83 ec ff call 0xffffffffffec83ef
>>> b: 41 89 c2 mov %eax,%r10d
>>> e: 83 f8 ed cmp $0xffffffed,%eax
>>> 11: 0f 84 b3 fd ff ff je 0xfffffffffffffdca
>>> 17: 85 c0 test %eax,%eax
>>> 19: 74 05 je 0x20
>>> 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
>>> 20: 49 8b 87 08 87 01 00 mov 0x18708(%r15),%rax
>>> 27: 4c 89 ff mov %r15,%rdi
>>> 2a:* 48 8b 40 08 mov 0x8(%rax),%rax <-- trapping instruction
>>> 2e: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>> 32: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>> 36: e8 e4 42 fb ff call 0xfffffffffffb431f
>>> 3b: 41 89 c2 mov %eax,%r10d
>>> 3e: 85 c0 test %eax,%eax
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>> 0: 48 8b 40 08 mov 0x8(%rax),%rax
>>> 4: 0f b7 50 0a movzwl 0xa(%rax),%edx
>>> 8: 0f b7 70 08 movzwl 0x8(%rax),%esi
>>> c: e8 e4 42 fb ff call 0xfffffffffffb42f5
>>> 11: 41 89 c2 mov %eax,%r10d
>>> 14: 85 c0 test %eax,%eax
>>> rsyslogd: rsyslogd's groupid changed to 111
>>> kernel: [ 5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>> kernel: [ 5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>> kernel: [ 5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>> kernel: [ 5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>> kernel: [ 5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>> kernel: [ 5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>> kernel: [ 5.914410] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>> kernel: [ 5.914414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: [ 5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>> kernel: [ 5.914419] PKRU: 55555554
>>>
>>> Best regards,
>>> Mirsad
>>>
>>> On 1/18/24 18:23, Mirsad Todorovac wrote:
>>>> Hi,
>>>>
>>>> Unfortunately, I was not able to reboot in this kernel again to do the stack decode, but I thought
>>>> that any information about the NULL pointer dereference is better than no info.
>>>>
>>>> The system is Ubuntu 23.10 Mantic with AMD product: Navi 23 [Radeon RX 6600/6600 XT/6600M]
>>>> graphic card.
>>>>
>>>> Please find the config and the hw listing attached.
>>>>
>>>> Best regards,
>>>> Mirsad
>>>
>>>
>>>
>>>> kernel: [ 5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>> kernel: [ 5.576707] #PF: supervisor read access in kernel mode
>>>> kernel: [ 5.576710] #PF: error_code(0x0000) - not-present page
>>>> kernel: [ 5.576712] PGD 0 P4D 0
>>>> kernel: [ 5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>> kernel: [ 5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>>> kernel: [ 5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>>> kernel: [ 5.576726] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>> kernel: [ 5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>> kernel: [ 5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>> kernel: [ 5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>> kernel: [ 5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>> kernel: [ 5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>> kernel: [ 5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>> kernel: [ 5.576895] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>> kernel: [ 5.576898] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> kernel: [ 5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>> kernel: [ 5.576903] PKRU: 55555554
>>>> kernel: [ 5.576905] Call Trace:
>>>> kernel: [ 5.576907] <TASK>
>>>> kernel: [ 5.576909] ? show_regs+0x72/0x90
>>>> kernel: [ 5.576914] ? __die+0x25/0x80
>>>> kernel: [ 5.576917] ? page_fault_oops+0x154/0x4c0
>>>> kernel: [ 5.576921] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.576925] ? crypto_alloc_tfmmem.isra0+0x35/0x70
>>>> kernel: [ 5.576930] ? do_user_addr_fault+0x30e/0x6e0
>>>> kernel: [ 5.576934] ? exc_page_fault+0x84/0x1b0
>>>> kernel: [ 5.576937] ? asm_exc_page_fault+0x27/0x30
>>>> kernel: [ 5.576942] ? gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>> kernel: [ 5.577056] amdgpu_device_init+0xefa/0x2de0 [amdgpu]
>>>> kernel: [ 5.577158] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.577161] ? pci_bus_read_config_word+0x47/0x90
>>>> kernel: [ 5.577166] ? pci_read_config_word+0x27/0x60
>>>> kernel: [ 5.577168] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.577171] ? do_pci_enable_device+0xe1/0x110
>>>> kernel: [ 5.577176] amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
>>>> kernel: [ 5.577275] amdgpu_pci_probe+0x1a8/0x5e0 [amdgpu]
>>>> kernel: [ 5.577373] local_pci_probe+0x48/0xb0
>>>> kernel: [ 5.577377] pci_device_probe+0xc8/0x290
>>>> kernel: [ 5.577381] really_probe+0x1d2/0x440
>>>> kernel: [ 5.577386] __driver_probe_device+0x8a/0x190
>>>> kernel: [ 5.577389] driver_probe_device+0x23/0xd0
>>>> kernel: [ 5.577392] __driver_attach+0x10f/0x220
>>>> kernel: [ 5.577396] ? __pfx___driver_attach+0x10/0x10
>>>> kernel: [ 5.577399] bus_for_each_dev+0x7a/0xe0
>>>> kernel: [ 5.577402] driver_attach+0x1e/0x30
>>>> kernel: [ 5.577405] bus_add_driver+0x127/0x240
>>>> kernel: [ 5.577409] driver_register+0x64/0x140
>>>> kernel: [ 5.577411] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
>>>> kernel: [ 5.577521] __pci_register_driver+0x68/0x80
>>>> kernel: [ 5.577524] amdgpu_init+0x69/0xff0 [amdgpu]
>>>> kernel: [ 5.577628] do_one_initcall+0x46/0x330
>>>> kernel: [ 5.577632] ? kmalloc_trace+0x136/0x370
>>>> kernel: [ 5.577637] do_init_module+0x6a/0x280
>>>> kernel: [ 5.577640] load_module+0x2419/0x2500
>>>> kernel: [ 5.577647] init_module_from_file+0x9c/0xf0
>>>> kernel: [ 5.577649] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.577652] ? init_module_from_file+0x9c/0xf0
>>>> kernel: [ 5.577657] idempotent_init_module+0x184/0x240
>>>> kernel: [ 5.577661] __x64_sys_finit_module+0x64/0xd0
>>>> kernel: [ 5.577664] do_syscall_64+0x76/0x140
>>>> kernel: [ 5.577668] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.577671] ? ksys_mmap_pgoff+0x123/0x270
>>>> kernel: [ 5.577675] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.577678] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.577681] ? syscall_exit_to_user_mode+0x97/0x1e0
>>>> kernel: [ 5.577684] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.577687] ? do_syscall_64+0x85/0x140
>>>> kernel: [ 5.577689] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.577692] ? do_syscall_64+0x85/0x140
>>>> kernel: [ 5.577695] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.577698] ? do_syscall_64+0x85/0x140
>>>> kernel: [ 5.577700] ? srso_alias_return_thunk+0x5/0xfbef5
>>>> kernel: [ 5.577703] ? sysvec_call_function+0x4e/0xb0
>>>> kernel: [ 5.577707] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>>> kernel: [ 5.577709] RIP: 0033:0x7fdaa331e88d
>>>> kernel: [ 5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>>> kernel: [ 5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>> kernel: [ 5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>>> kernel: [ 5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>>> kernel: [ 5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>>> kernel: [ 5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>>> kernel: [ 5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>>> kernel: [ 5.577748] </TASK>
>>>> kernel: [ 5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>>> kernel: [ 5.577817] CR2: 0000000000000008
>>>> kernel: [ 5.577820] ---[ end trace 0000000000000000 ]---
>>>> kernel: [ 5.914230] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>>> kernel: [ 5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>>> rsyslogd: rsyslogd's groupid changed to 111
>>>> kernel: [ 5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>>> kernel: [ 5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>>> kernel: [ 5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>>> kernel: [ 5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>>> kernel: [ 5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>>> kernel: [ 5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>>> kernel: [ 5.914410] FS: 00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>>> kernel: [ 5.914414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> kernel: [ 5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>>> kernel: [ 5.914419] PKRU: 55555554
Powered by blists - more mailing lists