[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251023143031.149496-5-phasta@kernel.org>
Date: Thu, 23 Oct 2025 16:30:30 +0200
From: Philipp Stanner <phasta@...nel.org>
To: Matthew Brost <matthew.brost@...el.com>,
Danilo Krummrich <dakr@...nel.org>,
Philipp Stanner <phasta@...nel.org>,
Christian König <ckoenig.leichtzumerken@...il.com>,
David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>,
Tvrtko Ursulin <tvrtko.ursulin@...lia.com>
Cc: dri-devel@...ts.freedesktop.org,
linux-kernel@...r.kernel.org,
linux-media@...r.kernel.org
Subject: [PATCH v2 3/4] drm/sched: Add TODO entry for resubmitting jobs
Add the issue of a successor of drm_sched_resubmit_jobs() missing to the
TODO file.
Signed-off-by: Philipp Stanner <phasta@...nel.org>
---
drivers/gpu/drm/scheduler/TODO | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/TODO b/drivers/gpu/drm/scheduler/TODO
index 79044adb7d01..713dd62c58da 100644
--- a/drivers/gpu/drm/scheduler/TODO
+++ b/drivers/gpu/drm/scheduler/TODO
@@ -10,3 +10,29 @@
- Tasks:
1. Read the example entry.
2. Remove the entry once solved (never in this case)
+
+* GPU job resubmits
+ - Difficulty: hard
+ - Contact:
+ - Christian König <ckoenig.leichtzumerken@...il.com>
+ - Philipp Stanner <phasta@...nel.org>
+ - Description:
+ drm_sched_resubmit_jobs() is deprecated. Main reason being that it leads to
+ reinitializing dma_fences. See that function's docu for details. The better
+ approach for valid resubmissions by amdgpu and Xe is (apparently) to figure
+ out which job (and, through association: which entity) caused the hang. Then,
+ the job's buffer data, together with all other jobs' buffer data currently
+ in the same hardware ring, must be invalidated. This can for example be done
+ by overwriting it.
+ amdgpu currently determines which jobs are in the ring and need to be
+ overwritten by keeping copies of the job. Xe obtains that information by
+ directly accessing drm_sched's pending_list.
+ - Tasks:
+ 1. implement scheduler functionality through which
+ the driver can obtain the information which *broken* jobs are currently in
+ the hardware ring.
+ 2. Such infrastructure would then typically be used in
+ drm_sched_backend_ops.timedout_job(). Document that.
+ 3. Port a driver as first user.
+ 3. Document the new alternative in the docu of deprecated
+ drm_sched_resubmit_jobs().
--
2.49.0
Powered by blists - more mailing lists