[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wge0et+3PP47OBnNx66Q=i_XgqfGfrSmDGHSyp=Jn-CgQ@mail.gmail.com>
Date: Wed, 15 May 2024 13:06:15 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Dave Airlie <airlied@...il.com>
Cc: Daniel Vetter <daniel.vetter@...ll.ch>, dri-devel <dri-devel@...ts.freedesktop.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [git pull] drm for 6.10-rc1
On Tue, 14 May 2024 at 23:21, Dave Airlie <airlied@...il.com> wrote:
>
> In drivers the main thing is a new driver for ARM Mali firmware based
> GPUs, otherwise there are a lot of changes to amdgpu/xe/i915/msm and
> scattered changes to everything else.
Hmm. There's something seriously wrong with amdgpu.
I'm getting a ton of__force_merge warnings:
WARNING: CPU: 0 PID: 1069 at drivers/gpu/drm/drm_buddy.c:199
__force_merge+0x14f/0x180 [drm_buddy]
Modules linked in: hid_logitech_hidpp hid_logitech_dj uas
usb_storage amdgpu drm_ttm_helper ttm video drm_exec
drm_suballoc_helper amdxcp drm_buddy gpu_sched drm_display_helper
drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel drm
ghash_clmulni_intel igb atlantic nvme dca macsec ccp i2c_algo_bit
nvme_core sp5100_tco wmi ip6_tables ip_tables fuse
CPU: 0 PID: 1069 Comm: plymouthd Not tainted 6.9.0-07381-g3860ca371740 #60
Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS
MASTER/TRX40 AORUS MASTER, BIOS F7 09/07/2022
RIP: 0010:__force_merge+0x14f/0x180 [drm_buddy]
Code: 74 0d 49 8b 44 24 18 48 d3 e0 49 29 44 24 30 4c 89 e7 ba 01 00
00 00 e8 9f 00 00 00 44 39 e8 73 1f 49 8b 04 24 e9 25 ff ff ff <0f> 0b
4c 39 c3 75 a3 eb 99 b8 f4 ff ff ff c3 b8 f4 ff ff ff eb 02
RSP: 0018:ffffb87a81cb7908 EFLAGS: 00010246
RAX: ffff9b1915de8000 RBX: ffff9b1919478288 RCX: 000000000ffff800
RDX: ffff9b19194782f8 RSI: ffff9b19194782d0 RDI: ffff9b19194782b0
RBP: 0000000000000000 R08: ffff9b1919478288 R09: 0000000000001000
R10: 0000000000000800 R11: 0000000000000000 R12: ffff9b192590fa18
R13: 000000000000000d R14: 0000000010000000 R15: 0000000000000000
FS: 00007fa06bfa9740(0000) GS:ffff9b281e000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555adb857000 CR3: 000000011b516000 CR4: 0000000000350ef0
Call Trace:
? __force_merge+0x14f/0x180 [drm_buddy]
drm_buddy_alloc_blocks+0x249/0x400 [drm_buddy]
? __cond_resched+0x16/0x40
amdgpu_vram_mgr_new+0x204/0x3f0 [amdgpu]
ttm_resource_alloc+0x31/0x120 [ttm]
ttm_bo_alloc_resource+0xbc/0x260 [ttm]
ttm_bo_validate+0x9f/0x210 [ttm]
ttm_bo_init_reserved+0x103/0x130 [ttm]
amdgpu_bo_create+0x246/0x400 [amdgpu]
? amdgpu_bo_destroy+0x70/0x70 [amdgpu]
amdgpu_bo_create_user+0x29/0x40 [amdgpu]
amdgpu_mode_dumb_create+0x108/0x190 [amdgpu]
? amdgpu_bo_destroy+0x70/0x70 [amdgpu]
? drm_mode_create_dumb+0xa0/0xa0 [drm]
drm_ioctl_kernel+0xad/0xd0 [drm]
drm_ioctl+0x330/0x4b0 [drm]
? drm_mode_create_dumb+0xa0/0xa0 [drm]
amdgpu_drm_ioctl+0x41/0x80 [amdgpu]
__x64_sys_ioctl+0xd2a/0xe00
? update_process_times+0x89/0xa0
? tick_nohz_handler+0xe2/0x120
? timerqueue_add+0x94/0xa0
? __hrtimer_run_queues+0x12b/0x250
? ktime_get+0x34/0xb0
? lapic_next_event+0x12/0x20
? clockevents_program_event+0x78/0xd0
? hrtimer_interrupt+0x118/0x390
? sched_clock+0x5/0x10
do_syscall_64+0x68/0x130
? __irq_exit_rcu+0x53/0xb0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
and eventually the whole thing just crashes entirely, with a bad page
state in the VM:
BUG: Bad page state in process kworker/u261:13 pfn:31fb9a
page: refcount:0 mapcount:0 mapping:00000000ff0b239e index:0x37ce8
pfn:0x31fb9a
aops:btree_aops ino:1
flags: 0x2fffc600000020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x3fff)
page_type: 0xffffffff()
which comes from a btrfs worker (btrfs-delayed-meta
btrfs_work_helper), but I would not be surprised if that was caused by
whatever odd thing is going on with the DRM code. IOW, it *looks* like
this code ends up just corrupting memory in horrible ways.
Linus
Linus
Powered by blists - more mailing lists