lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260201190535.18575-2-sunlightlinux@gmail.com>
Date: Sun,  1 Feb 2026 21:05:36 +0200
From: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@...il.com>
To: alexdeucher@...il.com
Cc: alexander.deucher@....com,
	amd-gfx@...ts.freedesktop.org,
	christian.koenig@....com,
	dri-devel@...ts.freedesktop.org,
	ionut_n2001@...oo.com,
	linux-kernel@...r.kernel.org,
	sunlightlinux@...il.com,
	superm1@...nel.org
Subject: Re: [PATCH 1/1] drm/amdgpu: Fix TLB flush failures after hibernation resume

Hi Alex,

Thank you for the quick response and for the information about hibernation support.

Here's the stack trace showing the call chain when the TLB flush failures occur. The issue happens in two places:

1. During resume (hibernation restore):

Call Trace:
 dump_stack_lvl+0x5b/0x80
 amdgpu_gmc_fw_reg_write_reg_wait+0x1c7/0x1d0 [amdgpu]
 gmc_v9_0_hw_init+0x2e2/0x390 [amdgpu]
 gmc_v9_0_resume+0x26/0x70 [amdgpu]
 amdgpu_ip_block_resume+0x27/0x50 [amdgpu]
 amdgpu_device_ip_resume_phase1+0x55/0x90 [amdgpu]
 amdgpu_device_resume+0x69/0x380 [amdgpu]
 amdgpu_pmops_resume+0x46/0x80 [amdgpu]
 dpm_run_callback+0x4a/0x150
 device_resume+0x1df/0x2f0
 async_resume+0x21/0x30
 async_run_entry_fn+0x36/0x160
 process_one_work+0x193/0x350
 worker_thread+0x2d7/0x410

2. Subsequent failures during command submission:

Call Trace:
 dump_stack_lvl+0x5b/0x80
 amdgpu_gmc_fw_reg_write_reg_wait+0x1c7/0x1d0 [amdgpu]
 amdgpu_gmc_flush_gpu_tlb+0xd0/0x280 [amdgpu]
 amdgpu_gart_invalidate_tlb.part.0+0x59/0x90 [amdgpu]
 amdgpu_ttm_alloc_gart+0x146/0x180 [amdgpu]
 amdgpu_cs_parser_bos.isra.0+0x5d6/0x7d0 [amdgpu]
 amdgpu_cs_ioctl+0xbd0/0x1aa0 [amdgpu]
 drm_ioctl_kernel+0xa6/0x100
 drm_ioctl+0x262/0x520
 amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]

Error message: "amdgpu 0000:04:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706"

Full dmesg log available at: https://gitlab.freedesktop.org/-/project/4522/uploads/6a285ad2e24f4807e5d75c3f4ed5a7a1/dmesg-dump-stack.txt

Regarding the hibernation support issues you mentioned - I understand the limitations with secure boot and VRAM eviction. In my case, I have secure boot disabled and sufficient swap space, so the primary issue I'm encountering is this TLB flush failure.

I'm happy to test any patches or help with further debugging if needed.

Thanks,
Ionut

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ