[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a08baedd-75f8-4099-b465-0db9001bd719@amd.com>
Date: Thu, 4 Jan 2024 11:02:32 -0500
From: Harry Wentland <harry.wentland@....com>
To: Christian König <christian.koenig@....com>,
Olliver Schinagl <oliver@...inagl.nl>, dri-devel@...ts.freedesktop.org,
linux-kernel <linux-kernel@...r.kernel.org>,
Alex Deucher <Alexander.Deucher@....com>
Cc: Huang Rui <ray.huang@....com>, David Airlie <airlied@...il.com>,
Daniel Vetter <daniel@...ll.ch>, Sumit Semwal <sumit.semwal@...aro.org>
Subject: Re: DRM TTM stack trace dump on ancient hardware
Oland is DCE 6 and won't default to DC.
Harry
On 2024-01-04 10:51, Christian König wrote:
> Hi Olliver,
>
> well as long as you don't explicitly disable the support for the older hw generations the R7 250 is still supported and should still work perfectly fine.
>
> What you see here is basically some reference counting issue, most likely in the display code.
>
> Question to Alex and Harry is CIK already using DC or the classic display code? If it's DC then it looks like we either miss unpinning a BO or grabbing a reference to a BO.
>
> Regards,
> Christian.
>
> Am 04.01.24 um 16:38 schrieb Olliver Schinagl:
>> Sorry for just dumping this here, but for those that think this is important, just rebooted after a weird btrfs crash (remounted r/o, no dataloss it seems), probably a new kernel, and got duped with the following. Things 'seem' to work fine however. I don't even know how or where to google for this one.
>>
>>
>> My graphics card is I think the R7 250, or some old beast like that, and I also know i'm probably shouldn't be using amdgpu on this oldtimer?
>>
>> Linux 6.6.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 02 Jan 2024 02:28:28 +0000 x86_64 GNU/Linux
>> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland XT [Radeon HD 8670 / R5 340X OEM / R7 250/350/350X OEM]
>> model name : AMD FX(tm)-8350 Eight-Core Processor
>>
>> [ 0.000000] Command line: BOOT_IMAGE=/arch_root/boot/vmlinuz-linux root=UUID=d rw rootflags=subvol=arch_root radeon.audio=1 radeon.si_support=0 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 LANG=en_US.UTF-8 ivrs_ioapic=9@...0:00:14.0 ivrs_ioapic=10@...0:00:00.2 noibrs noibpb nopti mitigations=off
>> [ 0.091847] Kernel command line: BOOT_IMAGE=/arch_root/boot/vmlinuz-linux root=UUID=d rw rootflags=subvol=arch_root radeon.audio=1 radeon.si_support=0 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 LANG=en_US.UTF-8 ivrs_ioapic=9@...0:00:14.0 ivrs_ioapic=10@...0:00:00.2 noibrs noibpb nopti mitigations=off
>> [ 1.490484] [drm] radeon kernel modesetting enabled.
>> [ 1.490565] radeon 0000:01:00.0: SI support disabled by module param
>> [ 4.627771] [drm] amdgpu kernel modesetting enabled.
>> [ 4.627955] amdgpu: Virtual CRAT table created for CPU
>> [ 4.627967] amdgpu: Topology: Add CPU node
>> [ 4.650039] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
>> [ 4.650042] amdgpu: ATOM BIOS: 113-C6620600-S01
>> [ 4.650054] kfd kfd: amdgpu: OLAND not supported in kfd
>> [ 4.678004] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
>> [ 4.678007] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
>> [ 4.678010] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
>> [ 4.678715] amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
>> [ 4.678718] amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
>> [ 4.678878] [drm] amdgpu: 2048M of VRAM memory ready
>> [ 4.678880] [drm] amdgpu: 11487M of GTT memory ready.
>> [ 4.679218] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400400000).
>> [ 4.680506] [drm] amdgpu: dpm initialized
>> [ 4.680527] [drm] AMDGPU Display Connectors
>> [ 5.209956] amdgpu 0000:01:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 6, active_cu_number 6
>> [ 5.521572] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:01:00.0 on minor 1
>> [ 5.670853] fbcon: amdgpudrmfb (fb0) is primary device
>> [ 5.731643] amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
>>
>> But kernel dumps like this are usually not a good thing (tm).
>>
>> [ 32.161704] ------------[ cut here ]------------
>> [ 32.161708] WARNING: CPU: 0 PID: 603 at drivers/gpu/drm/ttm/ttm_bo.c:326 ttm_bo_release+0x292/0x2e0 [ttm]
>> [ 32.161726] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter br_netfilter bridge rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device overlay 8021q garp mrp stp llc cmac algif_hash algif_skcipher af_alg bnep it87 hwmon_vid edac_mce_amd kvm_amd ccp snd_hda_codec_realtek kvm snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel btusb irqbypass snd_intel_dspcfg btrtl crct10dif_pclmul snd_intel_sdw_acpi btintel crc32_pclmul btbcm polyval_clmulni snd_hda_codec eeepc_wmi btmtk polyval_generic gf128mul asus_wmi bluetooth snd_hda_core ledtrig_audio ghash_clmulni_intel r8169 sparse_keymap sha512_ssse3 snd_hwdep sha1_ssse3 platform_profile snd_pcm i8042 ecdh_generic aesni_intel serio realtek sp5100_tco snd_timer crypto_simd mdio_devres wmi_bmof rfkill cryptd pcspkr acpi_cpufreq k10temp fam15h_power i2c_piix4 snd crc16 soundcore libphy joydev mousedev mac_hid vfat fat sg
>> crypto_user fuse dm_mod loop nfnetlink ip_tables
>> [ 32.161780] x_tables usbhid amdgpu drm_exec amdxcp drm_buddy gpu_sched btrfs radeon blake2b_generic libcrc32c crc32c_generic xor raid6_pq drm_ttm_helper ttm video nvme i2c_algo_bit drm_suballoc_helper crc32c_intel nvme_core sha256_ssse3 drm_display_helper nvme_common xhci_pci cec xhci_pci_renesas wmi uas usb_storage
>> [ 32.161800] CPU: 0 PID: 603 Comm: Xorg Not tainted 6.6.9-arch1-1 #1 e215ab44d1af91c0f0e686ff953f296051be417c
>> [ 32.161803] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97, BIOS 1605 10/25/2012
>> [ 32.161804] RIP: 0010:ttm_bo_release+0x292/0x2e0 [ttm]
>> [ 32.161816] Code: 49 8b b4 24 40 08 00 00 48 83 c4 38 48 8d 53 30 bf 40 01 00 00 5b 5d 41 5c 41 5d 41 5e e9 26 29 68 d1 4c 89 e7 e9 5b fe ff ff <0f> 0b 48 83 7b 20 00 0f 84 a6 fd ff ff 0f 0b e9 9f fd ff ff c7 43
>> [ 32.161818] RSP: 0018:ffffb02cc0cdbc18 EFLAGS: 00010202
>> [ 32.161820] RAX: 0000000000000000 RBX: ffff9291c073fdd0 RCX: 0000000000400033
>> [ 32.161821] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9291c073fdd0
>> [ 32.161823] RBP: ffff9291c073fc58 R08: 0000000000000000 R09: 0000000000400033
>> [ 32.161824] R10: ffff9291622bb780 R11: 0000000000000000 R12: ffff92914c98eee0
>> [ 32.161825] R13: 0000000000000001 R14: ffff92917835c848 R15: ffff9291c6418788
>> [ 32.161826] FS: 00007f0691a205c0(0000) GS:ffff929627c00000(0000) knlGS:0000000000000000
>> [ 32.161828] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 32.161829] CR2: 00007f0690d78c2c CR3: 00000002df620000 CR4: 00000000000406f0
>> [ 32.161831] Call Trace:
>> [ 32.161832] <TASK>
>> [ 32.161833] ? ttm_bo_release+0x292/0x2e0 [ttm d3be1c6b438b7d4abed1793b797fc6e1ac6a8908]
>> [ 32.161844] ? __warn+0x81/0x130
>> [ 32.161849] ? ttm_bo_release+0x292/0x2e0 [ttm d3be1c6b438b7d4abed1793b797fc6e1ac6a8908]
>> [ 32.161861] ? report_bug+0x171/0x1a0
>> [ 32.161866] ? handle_bug+0x3c/0x80
>> [ 32.161868] ? exc_invalid_op+0x17/0x70
>> [ 32.161870] ? asm_exc_invalid_op+0x1a/0x20
>> [ 32.161875] ? ttm_bo_release+0x292/0x2e0 [ttm d3be1c6b438b7d4abed1793b797fc6e1ac6a8908]
>> [ 32.161887] amdgpu_bo_unref+0x1e/0x30 [amdgpu 2f3ce605d8443bb7ca6dfe278dd999d24fdac211]
>> [ 32.162520] amdgpu_gem_object_free+0x34/0x60 [amdgpu 2f3ce605d8443bb7ca6dfe278dd999d24fdac211]
>> [ 32.162978] drm_gem_object_release_handle+0x54/0x60
>> [ 32.162984] ? __pfx_drm_gem_object_release_handle+0x10/0x10
>> [ 32.162987] idr_for_each+0x71/0xf0
>> [ 32.162991] drm_gem_release+0x20/0x30
>> [ 32.162995] drm_file_free+0x1f8/0x270
>> [ 32.162999] drm_release+0x74/0xf0
>> [ 32.163002] __fput+0xea/0x290
>> [ 32.163007] task_work_run+0x5a/0x90
>> [ 32.163011] do_exit+0x377/0xb20
>> [ 32.163014] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
>> [ 32.163019] do_group_exit+0x31/0x80
>> [ 32.163022] __x64_sys_exit_group+0x18/0x20
>> [ 32.163025] do_syscall_64+0x5d/0x90
>> [ 32.163029] ? __count_memcg_events+0x42/0x90
>> [ 32.163033] ? count_memcg_events.constprop.0+0x1a/0x30
>> [ 32.163037] ? handle_mm_fault+0xa2/0x360
>> [ 32.163040] ? do_user_addr_fault+0x30f/0x660
>> [ 32.163043] ? exc_page_fault+0x7f/0x180
>> [ 32.163045] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>> [ 32.163048] RIP: 0033:0x7f069171ce2d
>> [ 32.163074] Code: Unable to access opcode bytes at 0x7f069171ce03.
>> [ 32.163075] RSP: 002b:00007fff0d3ba328 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
>> [ 32.163077] RAX: ffffffffffffffda RBX: 00007f069181cfa8 RCX: 00007f069171ce2d
>> [ 32.163079] RDX: 00000000000000e7 RSI: fffffffffffffd08 RDI: 0000000000000000
>> [ 32.163080] RBP: 0000000000000883 R08: 0000000562be99f3 R09: 0000000000000000
>> [ 32.163081] R10: 0000562be99f3690 R11: 0000000000000202 R12: 0000000000000000
>> [ 32.163082] R13: 0000000000000000 R14: 00007f069181b680 R15: 00007f069181cfc0
>> [ 32.163085] </TASK>
>> [ 32.163086] ---[ end trace 0000000000000000 ]---
>>
>>
>> Thanks,
>>
>> Olliver
>>
>
Powered by blists - more mailing lists