linux-cve-announce - CVE-2025-40329: drm/sched: Fix deadlock in drm_sched_entity_kill_jobs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <2025120910-CVE-2025-40329-1ead@gregkh>
Date: Tue,  9 Dec 2025 13:10:11 +0900
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: linux-cve-announce@...r.kernel.org
Cc: Greg Kroah-Hartman <gregkh@...nel.org>
Subject: CVE-2025-40329: drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb

From: Greg Kroah-Hartman <gregkh@...nel.org>

Description
===========

In the Linux kernel, the following vulnerability has been resolved:

drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb

The Mesa issue referenced below pointed out a possible deadlock:

[ 1231.611031]  Possible interrupt unsafe locking scenario:

[ 1231.611033]        CPU0                    CPU1
[ 1231.611034]        ----                    ----
[ 1231.611035]   lock(&xa->xa_lock#17);
[ 1231.611038]                                local_irq_disable();
[ 1231.611039]                                lock(&fence->lock);
[ 1231.611041]                                lock(&xa->xa_lock#17);
[ 1231.611044]   <Interrupt>
[ 1231.611045]     lock(&fence->lock);
[ 1231.611047]
                *** DEADLOCK ***

In this example, CPU0 would be any function accessing job->dependencies
through the xa_* functions that don't disable interrupts (eg:
drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()).

CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling
callback so in an interrupt context. It will deadlock when trying to
grab the xa_lock which is already held by CPU0.

Replacing all xa_* usage by their xa_*_irq counterparts would fix
this issue, but Christian pointed out another issue: dma_fence_signal
takes fence.lock and so does dma_fence_add_callback.

  dma_fence_signal() // locks f1.lock
  -> drm_sched_entity_kill_jobs_cb()
  -> foreach dependencies
     -> dma_fence_add_callback() // locks f2.lock

This will deadlock if f1 and f2 share the same spinlock.

To fix both issues, the code iterating on dependencies and re-arming them
is moved out to drm_sched_entity_kill_jobs_work().

[phasta: commit message nits]

The Linux kernel CVE team has assigned CVE-2025-40329 to this issue.


Affected and fixed versions
===========================

	Issue introduced in 6.2 with commit 2fdb8a8f07c2f1353770a324fd19b8114e4329ac and fixed in 6.6.117 with commit 70150b9443dddf02157d821c68abf438f55a2e8e
	Issue introduced in 6.2 with commit 2fdb8a8f07c2f1353770a324fd19b8114e4329ac and fixed in 6.12.58 with commit 0d63031ee4a57be0252cb9a4e09ae921c75cece9
	Issue introduced in 6.2 with commit 2fdb8a8f07c2f1353770a324fd19b8114e4329ac and fixed in 6.17.8 with commit 3e8ada4fd838e3fd2cca94000dac054f3a347c01
	Issue introduced in 6.2 with commit 2fdb8a8f07c2f1353770a324fd19b8114e4329ac and fixed in 6.18 with commit 487df8b698345dd5a91346335f05170ed5f29d4e

Please see https://www.kernel.org for a full list of currently supported
kernel versions by the kernel community.

Unaffected versions might change over time as fixes are backported to
older supported kernel versions.  The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2025-40329
will be updated if fixes are backported, please check that for the most
up to date information about this issue.


Affected files
==============

The file(s) affected by this issue are:
	drivers/gpu/drm/scheduler/sched_entity.c


Mitigation
==========

The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes.  Individual
changes are never tested alone, but rather are part of a larger kernel
release.  Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all.  If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
	https://git.kernel.org/stable/c/70150b9443dddf02157d821c68abf438f55a2e8e
	https://git.kernel.org/stable/c/0d63031ee4a57be0252cb9a4e09ae921c75cece9
	https://git.kernel.org/stable/c/3e8ada4fd838e3fd2cca94000dac054f3a347c01
	https://git.kernel.org/stable/c/487df8b698345dd5a91346335f05170ed5f29d4e