lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251017134716.187723-3-phasta@kernel.org>
Date: Fri, 17 Oct 2025 15:47:02 +0200
From: Philipp Stanner <phasta@...nel.org>
To: Matthew Brost <matthew.brost@...el.com>,
	Danilo Krummrich <dakr@...nel.org>,
	Philipp Stanner <phasta@...nel.org>,
	Christian König <ckoenig.leichtzumerken@...il.com>,
	David Airlie <airlied@...il.com>,
	Simona Vetter <simona@...ll.ch>,
	Sumit Semwal <sumit.semwal@...aro.org>
Cc: dri-devel@...ts.freedesktop.org,
	linux-kernel@...r.kernel.org,
	linux-media@...r.kernel.org
Subject: [PATCH 2/3] drm/sched: Add TODO file with first entry

Add a drm_sched TODO file with open tasks, contact info, difficulty
level and a job description.

Add the missing successor of drm_sched_resubmit_jobs() as a first task.

Signed-off-by: Philipp Stanner <phasta@...nel.org>
---
 drivers/gpu/drm/scheduler/TODO | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)
 create mode 100644 drivers/gpu/drm/scheduler/TODO

diff --git a/drivers/gpu/drm/scheduler/TODO b/drivers/gpu/drm/scheduler/TODO
new file mode 100644
index 000000000000..6a06e2858dd6
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/TODO
@@ -0,0 +1,27 @@
+=== drm_sched TODO list ===
+
+* GPU job resubmits
+  - Difficulty: hard
+  - Contact:
+    - Christian König <ckoenig.leichtzumerken@...il.com>
+    - Philipp Stanner <phasta@...nel.org>
+  - Description:
+    drm_sched_resubmit_jobs() is deprecated. Main reason being that it leads to
+    reinitializing dma_fences. See that function's docu for details. The better
+    approach for valid resubmissions by amdgpu and Xe is (apparently) to figure
+    out which job (and, through association: which entity) caused the hang. Then,
+    the job's buffer data, together with all other jobs' buffer data currently
+    in the same hardware ring, must be invalidated. This can for example be done
+    by overwriting it.
+    amdgpu currently determines which jobs are in the ring and need to be
+    overwritten by keeping copies of the job. Xe obtains that information by
+    directly accessing drm_sched's pending_list.
+  - Tasks:
+	1. implement scheduler functionality through which
+	   the driver can obtain the information which *broken* jobs are currently in
+	   the hardware ring.
+	2. Such infrastructure would then typically be used in
+	   drm_sched_backend_ops.timedout_job(). Document that.
+	3. Port a driver as first user.
+	3. Document the new alternative in the docu of deprecated
+	   drm_sched_resubmit_jobs().
-- 
2.49.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ