lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <2025081654-CVE-2025-38520-1f4f@gregkh>
Date: Sat, 16 Aug 2025 12:58:02 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: linux-cve-announce@...r.kernel.org
Cc: Greg Kroah-Hartman <gregkh@...nel.org>
Subject: CVE-2025-38520: drm/amdkfd: Don't call mmput from MMU notifier callback

From: Greg Kroah-Hartman <gregkh@...nel.org>

Description
===========

In the Linux kernel, the following vulnerability has been resolved:

drm/amdkfd: Don't call mmput from MMU notifier callback

If the process is exiting, the mmput inside mmu notifier callback from
compactd or fork or numa balancing could release the last reference
of mm struct to call exit_mmap and free_pgtable, this triggers deadlock
with below backtrace.

The deadlock will leak kfd process as mmu notifier release is not called
and cause VRAM leaking.

The fix is to take mm reference mmget_non_zero when adding prange to the
deferred list to pair with mmput in deferred list work.

If prange split and add into pchild list, the pchild work_item.mm is not
used, so remove the mm parameter from svm_range_unmap_split and
svm_range_add_child.

The backtrace of hung task:

 INFO: task python:348105 blocked for more than 64512 seconds.
 Call Trace:
  __schedule+0x1c3/0x550
  schedule+0x46/0xb0
  rwsem_down_write_slowpath+0x24b/0x4c0
  unlink_anon_vmas+0xb1/0x1c0
  free_pgtables+0xa9/0x130
  exit_mmap+0xbc/0x1a0
  mmput+0x5a/0x140
  svm_range_cpu_invalidate_pagetables+0x2b/0x40 [amdgpu]
  mn_itree_invalidate+0x72/0xc0
  __mmu_notifier_invalidate_range_start+0x48/0x60
  try_to_unmap_one+0x10fa/0x1400
  rmap_walk_anon+0x196/0x460
  try_to_unmap+0xbb/0x210
  migrate_page_unmap+0x54d/0x7e0
  migrate_pages_batch+0x1c3/0xae0
  migrate_pages_sync+0x98/0x240
  migrate_pages+0x25c/0x520
  compact_zone+0x29d/0x590
  compact_zone_order+0xb6/0xf0
  try_to_compact_pages+0xbe/0x220
  __alloc_pages_direct_compact+0x96/0x1a0
  __alloc_pages_slowpath+0x410/0x930
  __alloc_pages_nodemask+0x3a9/0x3e0
  do_huge_pmd_anonymous_page+0xd7/0x3e0
  __handle_mm_fault+0x5e3/0x5f0
  handle_mm_fault+0xf7/0x2e0
  hmm_vma_fault.isra.0+0x4d/0xa0
  walk_pmd_range.isra.0+0xa8/0x310
  walk_pud_range+0x167/0x240
  walk_pgd_range+0x55/0x100
  __walk_page_range+0x87/0x90
  walk_page_range+0xf6/0x160
  hmm_range_fault+0x4f/0x90
  amdgpu_hmm_range_get_pages+0x123/0x230 [amdgpu]
  amdgpu_ttm_tt_get_user_pages+0xb1/0x150 [amdgpu]
  init_user_pages+0xb1/0x2a0 [amdgpu]
  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x543/0x7d0 [amdgpu]
  kfd_ioctl_alloc_memory_of_gpu+0x24c/0x4e0 [amdgpu]
  kfd_ioctl+0x29d/0x500 [amdgpu]

(cherry picked from commit a29e067bd38946f752b0ef855f3dfff87e77bec7)

The Linux kernel CVE team has assigned CVE-2025-38520 to this issue.


Affected and fixed versions
===========================

	Issue introduced in 5.19 with commit fa582c6f3684ac0098a9d02ddf0ed52a02b37127 and fixed in 6.1.148 with commit c1bde9d48e09933c361521720f77a8072083c83a
	Issue introduced in 5.19 with commit fa582c6f3684ac0098a9d02ddf0ed52a02b37127 and fixed in 6.6.101 with commit 145a56bd68f4bff098d59fbc7c263d20dfef4fc4
	Issue introduced in 5.19 with commit fa582c6f3684ac0098a9d02ddf0ed52a02b37127 and fixed in 6.12.39 with commit e90ee15ce28c61f6d83a0511c3e02e2662478350
	Issue introduced in 5.19 with commit fa582c6f3684ac0098a9d02ddf0ed52a02b37127 and fixed in 6.15.7 with commit a7eb0a25010a674c8fdfbece38353ef7be8c5834
	Issue introduced in 5.19 with commit fa582c6f3684ac0098a9d02ddf0ed52a02b37127 and fixed in 6.16 with commit cf234231fcbc7d391e2135b9518613218cc5347f
	Issue introduced in 5.15.49 with commit 09c5cdbc62d99fc6306a21b24b60eb11a3bd0963
	Issue introduced in 5.18.6 with commit 4b29b8d7c20f54eec0ff266b4a3f419bd251ed83

Please see https://www.kernel.org for a full list of currently supported
kernel versions by the kernel community.

Unaffected versions might change over time as fixes are backported to
older supported kernel versions.  The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2025-38520
will be updated if fixes are backported, please check that for the most
up to date information about this issue.


Affected files
==============

The file(s) affected by this issue are:
	drivers/gpu/drm/amd/amdkfd/kfd_svm.c


Mitigation
==========

The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes.  Individual
changes are never tested alone, but rather are part of a larger kernel
release.  Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all.  If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
	https://git.kernel.org/stable/c/c1bde9d48e09933c361521720f77a8072083c83a
	https://git.kernel.org/stable/c/145a56bd68f4bff098d59fbc7c263d20dfef4fc4
	https://git.kernel.org/stable/c/e90ee15ce28c61f6d83a0511c3e02e2662478350
	https://git.kernel.org/stable/c/a7eb0a25010a674c8fdfbece38353ef7be8c5834
	https://git.kernel.org/stable/c/cf234231fcbc7d391e2135b9518613218cc5347f

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ