lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a08baedd-75f8-4099-b465-0db9001bd719@amd.com>
Date: Thu, 4 Jan 2024 11:02:32 -0500
From: Harry Wentland <harry.wentland@....com>
To: Christian König <christian.koenig@....com>,
 Olliver Schinagl <oliver@...inagl.nl>, dri-devel@...ts.freedesktop.org,
 linux-kernel <linux-kernel@...r.kernel.org>,
 Alex Deucher <Alexander.Deucher@....com>
Cc: Huang Rui <ray.huang@....com>, David Airlie <airlied@...il.com>,
 Daniel Vetter <daniel@...ll.ch>, Sumit Semwal <sumit.semwal@...aro.org>
Subject: Re: DRM TTM stack trace dump on ancient hardware

Oland is DCE 6 and won't default to DC.

Harry

On 2024-01-04 10:51, Christian König wrote:
> Hi Olliver,
>
> well as long as you don't explicitly disable the support for the older hw generations the R7 250 is still supported and should still work perfectly fine.
>
> What you see here is basically some reference counting issue, most likely in the display code.
>
> Question to Alex and Harry is CIK already using DC or the classic display code? If it's DC then it looks like we either miss unpinning a BO or grabbing a reference to a BO.
>
> Regards,
> Christian.
>
> Am 04.01.24 um 16:38 schrieb Olliver Schinagl:
>> Sorry for just dumping this here, but for those that think this is important, just rebooted after a weird btrfs crash (remounted r/o, no dataloss it seems), probably a new kernel, and got duped with the following. Things 'seem' to work fine however. I don't even know how or where to google for this one.
>>
>>
>> My graphics card is I think the R7 250, or some old beast like that, and I also know i'm probably shouldn't be using amdgpu on this oldtimer?
>>
>> Linux 6.6.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 02 Jan 2024 02:28:28 +0000 x86_64 GNU/Linux
>> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland XT [Radeon HD 8670 / R5 340X OEM / R7 250/350/350X OEM]
>> model name    : AMD FX(tm)-8350 Eight-Core Processor
>>
>> [    0.000000] Command line: BOOT_IMAGE=/arch_root/boot/vmlinuz-linux root=UUID=d rw rootflags=subvol=arch_root radeon.audio=1 radeon.si_support=0 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 LANG=en_US.UTF-8 ivrs_ioapic=9@...0:00:14.0 ivrs_ioapic=10@...0:00:00.2 noibrs noibpb nopti mitigations=off
>> [    0.091847] Kernel command line: BOOT_IMAGE=/arch_root/boot/vmlinuz-linux root=UUID=d rw rootflags=subvol=arch_root radeon.audio=1 radeon.si_support=0 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 LANG=en_US.UTF-8 ivrs_ioapic=9@...0:00:14.0 ivrs_ioapic=10@...0:00:00.2 noibrs noibpb nopti mitigations=off
>> [    1.490484] [drm] radeon kernel modesetting enabled.
>> [    1.490565] radeon 0000:01:00.0: SI support disabled by module param
>> [    4.627771] [drm] amdgpu kernel modesetting enabled.
>> [    4.627955] amdgpu: Virtual CRAT table created for CPU
>> [    4.627967] amdgpu: Topology: Add CPU node
>> [    4.650039] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
>> [    4.650042] amdgpu: ATOM BIOS: 113-C6620600-S01
>> [    4.650054] kfd kfd: amdgpu: OLAND  not supported in kfd
>> [    4.678004] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
>> [    4.678007] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
>> [    4.678010] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
>> [    4.678715] amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
>> [    4.678718] amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
>> [    4.678878] [drm] amdgpu: 2048M of VRAM memory ready
>> [    4.678880] [drm] amdgpu: 11487M of GTT memory ready.
>> [    4.679218] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400400000).
>> [    4.680506] [drm] amdgpu: dpm initialized
>> [    4.680527] [drm] AMDGPU Display Connectors
>> [    5.209956] amdgpu 0000:01:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 6, active_cu_number 6
>> [    5.521572] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:01:00.0 on minor 1
>> [    5.670853] fbcon: amdgpudrmfb (fb0) is primary device
>> [    5.731643] amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
>>
>> But kernel dumps like this are usually not a good thing (tm).
>>
>> [   32.161704] ------------[ cut here ]------------
>> [   32.161708] WARNING: CPU: 0 PID: 603 at drivers/gpu/drm/ttm/ttm_bo.c:326 ttm_bo_release+0x292/0x2e0 [ttm]
>> [   32.161726] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter br_netfilter bridge rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device overlay 8021q garp mrp stp llc cmac algif_hash algif_skcipher af_alg bnep it87 hwmon_vid edac_mce_amd kvm_amd ccp snd_hda_codec_realtek kvm snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel btusb irqbypass snd_intel_dspcfg btrtl crct10dif_pclmul snd_intel_sdw_acpi btintel crc32_pclmul btbcm polyval_clmulni snd_hda_codec eeepc_wmi btmtk polyval_generic gf128mul asus_wmi bluetooth snd_hda_core ledtrig_audio ghash_clmulni_intel r8169 sparse_keymap sha512_ssse3 snd_hwdep sha1_ssse3 platform_profile snd_pcm i8042 ecdh_generic aesni_intel serio realtek sp5100_tco snd_timer crypto_simd mdio_devres wmi_bmof rfkill cryptd pcspkr acpi_cpufreq k10temp fam15h_power i2c_piix4 snd crc16 soundcore libphy joydev mousedev mac_hid vfat fat sg
>> crypto_user fuse dm_mod loop nfnetlink ip_tables
>> [   32.161780]  x_tables usbhid amdgpu drm_exec amdxcp drm_buddy gpu_sched btrfs radeon blake2b_generic libcrc32c crc32c_generic xor raid6_pq drm_ttm_helper ttm video nvme i2c_algo_bit drm_suballoc_helper crc32c_intel nvme_core sha256_ssse3 drm_display_helper nvme_common xhci_pci cec xhci_pci_renesas wmi uas usb_storage
>> [   32.161800] CPU: 0 PID: 603 Comm: Xorg Not tainted 6.6.9-arch1-1 #1 e215ab44d1af91c0f0e686ff953f296051be417c
>> [   32.161803] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97, BIOS 1605 10/25/2012
>> [   32.161804] RIP: 0010:ttm_bo_release+0x292/0x2e0 [ttm]
>> [   32.161816] Code: 49 8b b4 24 40 08 00 00 48 83 c4 38 48 8d 53 30 bf 40 01 00 00 5b 5d 41 5c 41 5d 41 5e e9 26 29 68 d1 4c 89 e7 e9 5b fe ff ff <0f> 0b 48 83 7b 20 00 0f 84 a6 fd ff ff 0f 0b e9 9f fd ff ff c7 43
>> [   32.161818] RSP: 0018:ffffb02cc0cdbc18 EFLAGS: 00010202
>> [   32.161820] RAX: 0000000000000000 RBX: ffff9291c073fdd0 RCX: 0000000000400033
>> [   32.161821] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9291c073fdd0
>> [   32.161823] RBP: ffff9291c073fc58 R08: 0000000000000000 R09: 0000000000400033
>> [   32.161824] R10: ffff9291622bb780 R11: 0000000000000000 R12: ffff92914c98eee0
>> [   32.161825] R13: 0000000000000001 R14: ffff92917835c848 R15: ffff9291c6418788
>> [   32.161826] FS:  00007f0691a205c0(0000) GS:ffff929627c00000(0000) knlGS:0000000000000000
>> [   32.161828] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   32.161829] CR2: 00007f0690d78c2c CR3: 00000002df620000 CR4: 00000000000406f0
>> [   32.161831] Call Trace:
>> [   32.161832]  <TASK>
>> [   32.161833]  ? ttm_bo_release+0x292/0x2e0 [ttm d3be1c6b438b7d4abed1793b797fc6e1ac6a8908]
>> [   32.161844]  ? __warn+0x81/0x130
>> [   32.161849]  ? ttm_bo_release+0x292/0x2e0 [ttm d3be1c6b438b7d4abed1793b797fc6e1ac6a8908]
>> [   32.161861]  ? report_bug+0x171/0x1a0
>> [   32.161866]  ? handle_bug+0x3c/0x80
>> [   32.161868]  ? exc_invalid_op+0x17/0x70
>> [   32.161870]  ? asm_exc_invalid_op+0x1a/0x20
>> [   32.161875]  ? ttm_bo_release+0x292/0x2e0 [ttm d3be1c6b438b7d4abed1793b797fc6e1ac6a8908]
>> [   32.161887]  amdgpu_bo_unref+0x1e/0x30 [amdgpu 2f3ce605d8443bb7ca6dfe278dd999d24fdac211]
>> [   32.162520]  amdgpu_gem_object_free+0x34/0x60 [amdgpu 2f3ce605d8443bb7ca6dfe278dd999d24fdac211]
>> [   32.162978]  drm_gem_object_release_handle+0x54/0x60
>> [   32.162984]  ? __pfx_drm_gem_object_release_handle+0x10/0x10
>> [   32.162987]  idr_for_each+0x71/0xf0
>> [   32.162991]  drm_gem_release+0x20/0x30
>> [   32.162995]  drm_file_free+0x1f8/0x270
>> [   32.162999]  drm_release+0x74/0xf0
>> [   32.163002]  __fput+0xea/0x290
>> [   32.163007]  task_work_run+0x5a/0x90
>> [   32.163011]  do_exit+0x377/0xb20
>> [   32.163014]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
>> [   32.163019]  do_group_exit+0x31/0x80
>> [   32.163022]  __x64_sys_exit_group+0x18/0x20
>> [   32.163025]  do_syscall_64+0x5d/0x90
>> [   32.163029]  ? __count_memcg_events+0x42/0x90
>> [   32.163033]  ? count_memcg_events.constprop.0+0x1a/0x30
>> [   32.163037]  ? handle_mm_fault+0xa2/0x360
>> [   32.163040]  ? do_user_addr_fault+0x30f/0x660
>> [   32.163043]  ? exc_page_fault+0x7f/0x180
>> [   32.163045]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>> [   32.163048] RIP: 0033:0x7f069171ce2d
>> [   32.163074] Code: Unable to access opcode bytes at 0x7f069171ce03.
>> [   32.163075] RSP: 002b:00007fff0d3ba328 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
>> [   32.163077] RAX: ffffffffffffffda RBX: 00007f069181cfa8 RCX: 00007f069171ce2d
>> [   32.163079] RDX: 00000000000000e7 RSI: fffffffffffffd08 RDI: 0000000000000000
>> [   32.163080] RBP: 0000000000000883 R08: 0000000562be99f3 R09: 0000000000000000
>> [   32.163081] R10: 0000562be99f3690 R11: 0000000000000202 R12: 0000000000000000
>> [   32.163082] R13: 0000000000000000 R14: 00007f069181b680 R15: 00007f069181cfc0
>> [   32.163085]  </TASK>
>> [   32.163086] ---[ end trace 0000000000000000 ]---
>>
>>
>> Thanks,
>>
>> Olliver
>>
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ