[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260201190535.18575-2-sunlightlinux@gmail.com>
Date: Sun, 1 Feb 2026 21:05:36 +0200
From: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@...il.com>
To: alexdeucher@...il.com
Cc: alexander.deucher@....com,
amd-gfx@...ts.freedesktop.org,
christian.koenig@....com,
dri-devel@...ts.freedesktop.org,
ionut_n2001@...oo.com,
linux-kernel@...r.kernel.org,
sunlightlinux@...il.com,
superm1@...nel.org
Subject: Re: [PATCH 1/1] drm/amdgpu: Fix TLB flush failures after hibernation resume
Hi Alex,
Thank you for the quick response and for the information about hibernation support.
Here's the stack trace showing the call chain when the TLB flush failures occur. The issue happens in two places:
1. During resume (hibernation restore):
Call Trace:
dump_stack_lvl+0x5b/0x80
amdgpu_gmc_fw_reg_write_reg_wait+0x1c7/0x1d0 [amdgpu]
gmc_v9_0_hw_init+0x2e2/0x390 [amdgpu]
gmc_v9_0_resume+0x26/0x70 [amdgpu]
amdgpu_ip_block_resume+0x27/0x50 [amdgpu]
amdgpu_device_ip_resume_phase1+0x55/0x90 [amdgpu]
amdgpu_device_resume+0x69/0x380 [amdgpu]
amdgpu_pmops_resume+0x46/0x80 [amdgpu]
dpm_run_callback+0x4a/0x150
device_resume+0x1df/0x2f0
async_resume+0x21/0x30
async_run_entry_fn+0x36/0x160
process_one_work+0x193/0x350
worker_thread+0x2d7/0x410
2. Subsequent failures during command submission:
Call Trace:
dump_stack_lvl+0x5b/0x80
amdgpu_gmc_fw_reg_write_reg_wait+0x1c7/0x1d0 [amdgpu]
amdgpu_gmc_flush_gpu_tlb+0xd0/0x280 [amdgpu]
amdgpu_gart_invalidate_tlb.part.0+0x59/0x90 [amdgpu]
amdgpu_ttm_alloc_gart+0x146/0x180 [amdgpu]
amdgpu_cs_parser_bos.isra.0+0x5d6/0x7d0 [amdgpu]
amdgpu_cs_ioctl+0xbd0/0x1aa0 [amdgpu]
drm_ioctl_kernel+0xa6/0x100
drm_ioctl+0x262/0x520
amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
Error message: "amdgpu 0000:04:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706"
Full dmesg log available at: https://gitlab.freedesktop.org/-/project/4522/uploads/6a285ad2e24f4807e5d75c3f4ed5a7a1/dmesg-dump-stack.txt
Regarding the hibernation support issues you mentioned - I understand the limitations with secure boot and VRAM eviction. In my case, I have secure boot disabled and sufficient swap space, so the primary issue I'm encountering is this TLB flush failure.
I'm happy to test any patches or help with further debugging if needed.
Thanks,
Ionut
Powered by blists - more mailing lists