lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9cef3e4-bcc1-7d87-6663-82f4d84396e1@gmail.com>
Date:   Thu, 2 Jan 2020 21:54:05 +0100
From:   A L <crimsoncottage@...il.com>
To:     linux-kernel@...r.kernel.org
Subject: AMDGPU crash on 5.4.7 on AMD Athlon 3000G APU

Dear all,

There seems to be a regression between kernel 5.4.6 and 5.4.7. When I 
change from kernel 5.4.6 to kernel 5.4.7 I can no longer load the AMDGPU 
driver. The kernel immediately crashes with the following stack trace 
and errors. The system has to be hard reset to boot again.

[  320.086318] [drm] amdgpu kernel modesetting enabled.
[  320.086382] Parsing CRAT table with 1 nodes
[  320.086388] Creating topology SYSFS entries
[  320.086425] Topology: Add APU node [0x0:0x0]
[  320.086427] Finished initializing topology
[  320.086545] amdgpu 0000:06:00.0: remove_conflicting_pci_framebuffers: 
bar 0: 0xe0000000 -> 0xefffffff
[  320.086549] amdgpu 0000:06:00.0: remove_conflicting_pci_framebuffers: 
bar 2: 0xf0000000 -> 0xf01fffff
[  320.086552] amdgpu 0000:06:00.0: remove_conflicting_pci_framebuffers: 
bar 5: 0xfce00000 -> 0xfce7ffff
[  320.086554] checking generic (e0000000 7f0000) vs hw (e0000000 10000000)
[  320.086557] fb0: switching to amdgpudrmfb from VESA VGA
[  320.086647] Console: switching to colour dummy device 80x25
[  320.086673] amdgpu 0000:06:00.0: vgaarb: deactivate vga console
[  320.086839] [drm] initializing kernel modesetting (RAVEN 
0x1002:0x15D8 0x1002:0x15D8 0xCC).
[  320.086850] [drm] register mmio base: 0xFCE00000
[  320.086851] [drm] register mmio size: 524288
[  320.086868] [drm] add ip block number 0 <soc15_common>
[  320.086869] [drm] add ip block number 1 <gmc_v9_0>
[  320.086869] [drm] add ip block number 2 <vega10_ih>
[  320.086870] [drm] add ip block number 3 <psp>
[  320.086870] [drm] add ip block number 4 <gfx_v9_0>
[  320.086871] [drm] add ip block number 5 <sdma_v4_0>
[  320.086871] [drm] add ip block number 6 <powerplay>
[  320.086872] [drm] add ip block number 7 <dm>
[  320.086873] [drm] add ip block number 8 <vcn_v1_0>
[  320.112116] [drm] BIOS signature incorrect 0 0
[  320.112142] ATOM BIOS: 113-RAVEN2-115
[  320.112773] [drm] VCN decode is enabled in VM mode
[  320.112774] [drm] VCN encode is enabled in VM mode
[  320.112774] [drm] VCN jpeg decode is enabled in VM mode
[  320.112807] [drm] vm size is 262144 GB, 3 levels, block size is 
9-bit, fragment size is 9-bit
[  320.112813] amdgpu 0000:06:00.0: VRAM: 2048M 0x000000F400000000 - 
0x000000F47FFFFFFF (2048M used)
[  320.112814] amdgpu 0000:06:00.0: GART: 1024M 0x0000000000000000 - 
0x000000003FFFFFFF
[  320.112815] amdgpu 0000:06:00.0: AGP: 267419648M 0x000000F800000000 - 
0x0000FFFFFFFFFFFF
[  320.112818] [drm] Detected VRAM RAM=2048M, BAR=2048M
[  320.112818] [drm] RAM width 128bits DDR4
[  320.112858] [TTM] Zone  kernel: Available graphics memory: 3052426 KiB
[  320.112858] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[  320.112859] [TTM] Initializing pool allocator
[  320.112861] [TTM] Initializing DMA pool allocator
[  320.112913] [drm] amdgpu: 2048M of VRAM memory ready
[  320.112915] [drm] amdgpu: 3072M of GTT memory ready.
[  320.112923] [drm] GART: num cpu pages 262144, num gpu pages 262144
[  320.113067] [drm] PCIE GART of 1024M enabled (table at 
0x000000F400900000).
[  320.119226] [drm] use_doorbell being set to: [true]
[  320.119270] amdgpu: [powerplay] hwmgr_sw_init smu backed is smu10_smu
[  320.121347] [drm] Found VCN firmware Version: 1.86 Family ID: 18
[  320.121354] [drm] PSP loading VCN firmware
[  320.142076] [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
[  320.202902] [drm] failed to load ucode id (18)
[  320.202904] [drm] psp command failed and response status is (0x300F)
[  320.205881] [drm] failed to load ucode id (19)
[  320.205883] [drm] psp command failed and response status is (0xF)
[  320.208882] [drm] failed to load ucode id (20)
[  320.208883] [drm] psp command failed and response status is (0xF)
[  320.229776] [drm] DM_PPLIB: values for F clock
[  320.229778] [drm] DM_PPLIB:     0 in kHz, 3649 in mV
[  320.229779] [drm] DM_PPLIB:     0 in kHz, 0 in mV
[  320.229780] [drm] DM_PPLIB:     0 in kHz, 0 in mV
[  320.229780] [drm] DM_PPLIB:     0 in kHz, 0 in mV
[  320.229797] ------------[ cut here ]------------
[  320.229949] WARNING: CPU: 1 PID: 5908 at 
drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1464 
dcn_bw_update_from_pplib+0x94/0x2c0 [amdgpu]
[  320.229950] Modules linked in: amdgpu(+) gpu_sched ttm ip_set_hash_ip 
xt_state ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6table_raw 
ip6table_mangle xt_multiport ip6table_nat nfnetlink_log xt_limit 
xt_NFLOG ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter 
iptable_mangle xt_nat iptable_nat xt_CT iptable_raw ip_set_bitmap_port 
ip_set_hash_net nf_nat_pptp nf_conntrack_pptp nf_nat xt_sctp 
nf_conntrack_sip nf_conntrack_irc nf_conntrack_ftp nf_conntrack_h323 
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_bridge 
nf_conntrack nf_defrag_ipv6 ip6_tables ip_tables xt_recent xt_set ip_set 
nfnetlink nf_defrag_ipv4 nf_socket_ipv4 uas pinctrl_amd
[  320.229974] CPU: 1 PID: 5908 Comm: modprobe Not tainted 
5.4.7-gentoo-test2 #3
[  320.229975] Hardware name: Gigabyte Technology Co., Ltd. B450M 
DS3H/B450M DS3H-CF, BIOS F50 11/27/2019
[  320.230113] RIP: 0010:dcn_bw_update_from_pplib+0x94/0x2c0 [amdgpu]
[  320.230116] Code: 0c 24 85 c9 74 24 8d 71 ff 48 8d 44 24 04 48 8d 54 
f4 0c eb 0d 48 83 c0 08 48 39 d0 0f 84 13 01 00 00 44 8b 00 45 85 c0 75 
eb <0f> 0b e8 65 3e d4 e0 4c 89 e2 be 04 00 00 00 4c 89 ef e8 a5 9b fe
[  320.230117] RSP: 0018:ffffc9000045b700 EFLAGS: 00010246
[  320.230119] RAX: ffffc9000045b704 RBX: ffff88812c700000 RCX: 
0000000000000004
[  320.230120] RDX: ffffc9000045b724 RSI: 0000000000000003 RDI: 
ffff888218856350
[  320.230121] RBP: ffffc9000045b840 R08: 0000000000000000 R09: 
00000000000003c5
[  320.230122] R10: 0000000000000001 R11: 0000000000000000 R12: 
ffffc9000045b790
[  320.230123] R13: ffff8881341c8980 R14: 0000000000000001 R15: 
000000000000000b
[  320.230125] FS:  00007ff865e7db80(0000) GS:ffff888218840000(0000) 
knlGS:0000000000000000
[  320.230126] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  320.230127] CR2: 00007fd68de9f540 CR3: 000000012c4f8000 CR4: 
00000000003406e0
[  320.230128] Call Trace:
[  320.230135]  ? kmem_cache_alloc+0xe6/0x180
[  320.230271]  dcn10_create_resource_pool+0x7d9/0xb10 [amdgpu]
[  320.230406]  ? firmware_parser_create+0x6fb/0x720 [amdgpu]
[  320.230533]  dc_create_resource_pool+0x21/0x100 [amdgpu]
[  320.230660]  dc_create+0x206/0x680 [amdgpu]
[  320.230663]  ? kmem_cache_alloc+0xe6/0x180
[  320.230795]  amdgpu_dm_init+0x138/0x1c0 [amdgpu]
[  320.230800]  ? common_interrupt+0xa/0xf
[  320.230929]  ? phm_wait_for_register_unequal.part.0+0x44/0x70 [amdgpu]
[  320.231059]  dm_hw_init+0x9/0x20 [amdgpu]
[  320.231191]  amdgpu_device_init.cold+0xf47/0x129e [amdgpu]
[  320.231194]  ? __alloc_pages_nodemask+0x128/0x240
[  320.231300]  amdgpu_driver_load_kms+0x44/0xe0 [amdgpu]
[  320.231305]  drm_dev_register+0x109/0x150
[  320.231410]  amdgpu_pci_probe+0xe9/0x150 [amdgpu]
[  320.231414]  ? __pm_runtime_resume+0x44/0x50
[  320.231417]  local_pci_probe+0x38/0x70
[  320.231419]  ? pci_match_device+0xd2/0x100
[  320.231422]  pci_device_probe+0xe4/0x190
[  320.231425]  really_probe+0xdf/0x290
[  320.231427]  driver_probe_device+0x4b/0xc0
[  320.231430]  device_driver_attach+0x4e/0x60
[  320.231432]  __driver_attach+0x44/0xb0
[  320.231434]  ? device_driver_attach+0x60/0x60
[  320.231436]  bus_for_each_dev+0x5c/0x90
[  320.231438]  bus_add_driver+0x16d/0x1c0
[  320.231440]  driver_register+0x67/0xb0
[  320.231442]  ? 0xffffffffa0597000
[  320.231444]  do_one_initcall+0x44/0x16f
[  320.231447]  ? __vunmap+0x223/0x260
[  320.231449]  ? kmem_cache_alloc+0xe6/0x180
[  320.231452]  do_init_module+0x51/0x200
[  320.231455]  load_module+0x20d6/0x23d0
[  320.231458]  ? vfs_read+0x117/0x140
[  320.231461]  ? __do_sys_finit_module+0x9b/0xb0
[  320.231464]  __do_sys_finit_module+0x9b/0xb0
[  320.231466]  do_syscall_64+0x3d/0x100
[  320.231469]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  320.231471] RIP: 0033:0x7ff865f9e289
[  320.231474] Code: 00 00 00 75 05 48 83 c4 18 c3 e8 c2 5f 01 00 66 90 
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d7 4b 09 00 f7 d8 64 89 01 48
[  320.231475] RSP: 002b:00007ffe140b9878 EFLAGS: 00000246 ORIG_RAX: 
0000000000000139
[  320.231476] RAX: ffffffffffffffda RBX: 000055eeb9140ab0 RCX: 
00007ff865f9e289
[  320.231477] RDX: 0000000000000000 RSI: 000055eeb8c5533c RDI: 
0000000000000005
[  320.231478] RBP: 0000000000040000 R08: 0000000000000000 R09: 
000055eeb9140ca0
[  320.231479] R10: 0000000000000005 R11: 0000000000000246 R12: 
000055eeb8c5533c
[  320.231480] R13: 0000000000000000 R14: 000055eeb9140be0 R15: 
000055eeb9140ab0
[  320.231482] ---[ end trace 4d7f7927484d9651 ]---
[  320.231529] [drm] DM_PPLIB: values for DCF clock
[  320.231530] [drm] DM_PPLIB:     300000 in kHz, 3649 in mV
[  320.231531] [drm] DM_PPLIB:     600000 in kHz, 3974 in mV
[  320.231532] [drm] DM_PPLIB:     626000 in kHz, 4174 in mV
[  320.231532] [drm] DM_PPLIB:     654000 in kHz, 4325 in mV
[  320.237932] [drm] Display Core initialized with v3.2.48!
[  320.238368] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[  320.238369] [drm] Driver supports precise vblank timestamp query.
[  320.249980] [drm] VCN decode and encode initialized 
successfully(under SPG Mode).
[  320.251158] kfd kfd: Allocated 3969056 bytes on gart
[  320.251662] kfd kfd: Failed to resume IOMMU for device 1002:15d8
[  320.251891] kfd kfd: device 1002:15d8 NOT added due to errors
[  320.251962] [drm] Cannot find any crtc or sizes
[  320.252163] amdgpu 0000:06:00.0: ring gfx uses VM inv eng 0 on hub 0
[  320.252167] amdgpu 0000:06:00.0: ring comp_1.0.0 uses VM inv eng 1 on 
hub 0
[  320.252169] amdgpu 0000:06:00.0: ring comp_1.1.0 uses VM inv eng 4 on 
hub 0
[  320.252172] amdgpu 0000:06:00.0: ring comp_1.2.0 uses VM inv eng 5 on 
hub 0
[  320.252174] amdgpu 0000:06:00.0: ring comp_1.3.0 uses VM inv eng 6 on 
hub 0
[  320.252176] amdgpu 0000:06:00.0: ring comp_1.0.1 uses VM inv eng 7 on 
hub 0
[  320.252178] amdgpu 0000:06:00.0: ring comp_1.1.1 uses VM inv eng 8 on 
hub 0
[  320.252181] amdgpu 0000:06:00.0: ring comp_1.2.1 uses VM inv eng 9 on 
hub 0
[  320.252183] amdgpu 0000:06:00.0: ring comp_1.3.1 uses VM inv eng 10 
on hub 0
[  320.252185] amdgpu 0000:06:00.0: ring kiq_2.1.0 uses VM inv eng 11 on 
hub 0
[  320.252186] amdgpu 0000:06:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[  320.252188] amdgpu 0000:06:00.0: ring vcn_dec uses VM inv eng 1 on hub 1
[  320.252190] amdgpu 0000:06:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1
[  320.252192] amdgpu 0000:06:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1
[  320.252194] amdgpu 0000:06:00.0: ring vcn_jpeg uses VM inv eng 6 on hub 1
[  320.401903] AMD-Vi: Completion-Wait loop timed out
[  320.542017] AMD-Vi: Completion-Wait loop timed out
[  320.682015] AMD-Vi: Completion-Wait loop timed out
[  320.822091] AMD-Vi: Completion-Wait loop timed out
[  320.962130] AMD-Vi: Completion-Wait loop timed out
[  321.088018] AMD-Vi: Completion-Wait loop timed out
[  321.214038] AMD-Vi: Completion-Wait loop timed out
[  321.263146] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT 
device=06:00.0 address=0x217879410]
[  322.278079] clocksource: timekeeping watchdog on CPU3: Marking 
clocksource 'tsc' as unstable because the skew is too large:
[  322.278081] clocksource:                       'hpet' wd_now: 
1327ac6e wd_last: 129c4f3c mask: ffffffff
[  322.278082] clocksource:                       'tsc' cs_now: 
11ce84c06c0 cs_last: 11c76631aad mask: ffffffffffffffff
[  322.278084] tsc: Marking TSC unstable due to clocksource watchdog
[  322.369132] TSC found unstable after boot, most likely due to broken 
BIOS. Use 'tsc=unstable'.
[  322.369134] sched_clock: Marking unstable (322385637527, 
-16091409)<-(322450384861, -81255237)
[  322.734191] clocksource: Switched to clocksource hpet
[  336.585879] hpet: Lost 4 RTC interrupts

* The full paste is available at (1)
* lspci -vk paste is available at (2)
* kernel .config is available at (3)
* sys-kernel/linux-firmware-20191215 is installed.

The previous kernel 5.4.6 worked with no crashes. There was still the 
same stack trace, but no "Wait loop timed out" or "iommu ivhd0: AMD-Vi" 
error. Same kernel .config was used for both kernels.

The system is small headless machine with the new low-power AMD Athlon 
3000G APU with integrated VEGA 3 graphics (4)
Motherboard is a Gigabyte B450M. Two intel PCIe NICs are present.

1) http://dpaste.com/0X9FWCW
2) http://dpaste.com/10R7J9H
3) http://dpaste.com/064XG5E
4) https://www.amd.com/en/products/apu/amd-athlon-3000g

Regards,
Anders


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ