lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <2025072507-CVE-2025-38389-b1f4@gregkh>
Date: Fri, 25 Jul 2025 14:55:20 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: linux-cve-announce@...r.kernel.org
Cc: Greg Kroah-Hartman <gregkh@...nel.org>
Subject: CVE-2025-38389: drm/i915/gt: Fix timeline left held on VMA alloc error

From: Greg Kroah-Hartman <gregkh@...nel.org>

Description
===========

In the Linux kernel, the following vulnerability has been resolved:

drm/i915/gt: Fix timeline left held on VMA alloc error

The following error has been reported sporadically by CI when a test
unbinds the i915 driver on a ring submission platform:

<4> [239.330153] ------------[ cut here ]------------
<4> [239.330166] i915 0000:00:02.0: [drm] drm_WARN_ON(dev_priv->mm.shrink_count)
<4> [239.330196] WARNING: CPU: 1 PID: 18570 at drivers/gpu/drm/i915/i915_gem.c:1309 i915_gem_cleanup_early+0x13e/0x150 [i915]
...
<4> [239.330640] RIP: 0010:i915_gem_cleanup_early+0x13e/0x150 [i915]
...
<4> [239.330942] Call Trace:
<4> [239.330944]  <TASK>
<4> [239.330949]  i915_driver_late_release+0x2b/0xa0 [i915]
<4> [239.331202]  i915_driver_release+0x86/0xa0 [i915]
<4> [239.331482]  devm_drm_dev_init_release+0x61/0x90
<4> [239.331494]  devm_action_release+0x15/0x30
<4> [239.331504]  release_nodes+0x3d/0x120
<4> [239.331517]  devres_release_all+0x96/0xd0
<4> [239.331533]  device_unbind_cleanup+0x12/0x80
<4> [239.331543]  device_release_driver_internal+0x23a/0x280
<4> [239.331550]  ? bus_find_device+0xa5/0xe0
<4> [239.331563]  device_driver_detach+0x14/0x20
...
<4> [357.719679] ---[ end trace 0000000000000000 ]---

If the test also unloads the i915 module then that's followed with:

<3> [357.787478] =============================================================================
<3> [357.788006] BUG i915_vma (Tainted: G     U  W        N ): Objects remaining on __kmem_cache_shutdown()
<3> [357.788031] -----------------------------------------------------------------------------
<3> [357.788204] Object 0xffff888109e7f480 @offset=29824
<3> [357.788670] Allocated in i915_vma_instance+0xee/0xc10 [i915] age=292729 cpu=4 pid=2244
<4> [357.788994]  i915_vma_instance+0xee/0xc10 [i915]
<4> [357.789290]  init_status_page+0x7b/0x420 [i915]
<4> [357.789532]  intel_engines_init+0x1d8/0x980 [i915]
<4> [357.789772]  intel_gt_init+0x175/0x450 [i915]
<4> [357.790014]  i915_gem_init+0x113/0x340 [i915]
<4> [357.790281]  i915_driver_probe+0x847/0xed0 [i915]
<4> [357.790504]  i915_pci_probe+0xe6/0x220 [i915]
...

Closer analysis of CI results history has revealed a dependency of the
error on a few IGT tests, namely:
- igt@..._intel_allocator@...k-simple-stress-signal,
- igt@..._intel_allocator@...-level-inception-interruptible,
- igt@..._linear_blits@...erruptible,
- igt@...me_mmap_coherency@...tl-errors,
which invisibly trigger the issue, then exhibited with first driver unbind
attempt.

All of the above tests perform actions which are actively interrupted with
signals.  Further debugging has allowed to narrow that scope down to
DRM_IOCTL_I915_GEM_EXECBUFFER2, and ring_context_alloc(), specific to ring
submission, in particular.

If successful then that function, or its execlists or GuC submission
equivalent, is supposed to be called only once per GEM context engine,
followed by raise of a flag that prevents the function from being called
again.  The function is expected to unwind its internal errors itself, so
it may be safely called once more after it returns an error.

In case of ring submission, the function first gets a reference to the
engine's legacy timeline and then allocates a VMA.  If the VMA allocation
fails, e.g. when i915_vma_instance() called from inside is interrupted
with a signal, then ring_context_alloc() fails, leaving the timeline held
referenced.  On next I915_GEM_EXECBUFFER2 IOCTL, another reference to the
timeline is got, and only that last one is put on successful completion.
As a consequence, the legacy timeline, with its underlying engine status
page's VMA object, is still held and not released on driver unbind.

Get the legacy timeline only after successful allocation of the context
engine's VMA.

v2: Add a note on other submission methods (Krzysztof Karas):
    Both execlists and GuC submission use lrc_alloc() which seems free
    from a similar issue.

(cherry picked from commit cc43422b3cc79eacff4c5a8ba0d224688ca9dd4f)

The Linux kernel CVE team has assigned CVE-2025-38389 to this issue.


Affected and fixed versions
===========================

	Issue introduced in 5.4 with commit 75d0a7f31eec8ec4a53b4485905800e09dc5091f and fixed in 5.4.296 with commit 60b757730884e4a223152a68d9b5f625dac94119
	Issue introduced in 5.4 with commit 75d0a7f31eec8ec4a53b4485905800e09dc5091f and fixed in 5.10.240 with commit e47d7d6edc40a6ace7cc04e5893759fee68569f5
	Issue introduced in 5.4 with commit 75d0a7f31eec8ec4a53b4485905800e09dc5091f and fixed in 5.15.187 with commit f10af34261448610d4048ac6e6af87f80e3881a4
	Issue introduced in 5.4 with commit 75d0a7f31eec8ec4a53b4485905800e09dc5091f and fixed in 6.1.144 with commit 4c778c96e469fb719b11683e0a3be8ea68052fa2
	Issue introduced in 5.4 with commit 75d0a7f31eec8ec4a53b4485905800e09dc5091f and fixed in 6.6.97 with commit 40e09506aea1fde1f3e0e04eca531bbb23404baf
	Issue introduced in 5.4 with commit 75d0a7f31eec8ec4a53b4485905800e09dc5091f and fixed in 6.12.37 with commit 5a7ae7bebdc4c2ecd48a2c061319956f65c09473
	Issue introduced in 5.4 with commit 75d0a7f31eec8ec4a53b4485905800e09dc5091f and fixed in 6.15.6 with commit c542d62883f62ececafcb630a1c5010133826bea
	Issue introduced in 5.4 with commit 75d0a7f31eec8ec4a53b4485905800e09dc5091f and fixed in 6.16-rc5 with commit a5aa7bc1fca78c7fa127d9e33aa94a0c9066c1d6

Please see https://www.kernel.org for a full list of currently supported
kernel versions by the kernel community.

Unaffected versions might change over time as fixes are backported to
older supported kernel versions.  The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2025-38389
will be updated if fixes are backported, please check that for the most
up to date information about this issue.


Affected files
==============

The file(s) affected by this issue are:
	drivers/gpu/drm/i915/gt/intel_ring_submission.c


Mitigation
==========

The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes.  Individual
changes are never tested alone, but rather are part of a larger kernel
release.  Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all.  If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
	https://git.kernel.org/stable/c/60b757730884e4a223152a68d9b5f625dac94119
	https://git.kernel.org/stable/c/e47d7d6edc40a6ace7cc04e5893759fee68569f5
	https://git.kernel.org/stable/c/f10af34261448610d4048ac6e6af87f80e3881a4
	https://git.kernel.org/stable/c/4c778c96e469fb719b11683e0a3be8ea68052fa2
	https://git.kernel.org/stable/c/40e09506aea1fde1f3e0e04eca531bbb23404baf
	https://git.kernel.org/stable/c/5a7ae7bebdc4c2ecd48a2c061319956f65c09473
	https://git.kernel.org/stable/c/c542d62883f62ececafcb630a1c5010133826bea
	https://git.kernel.org/stable/c/a5aa7bc1fca78c7fa127d9e33aa94a0c9066c1d6

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ