[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <537bdebf2112a080ae92526ecfa41d63668d90a3.camel@redhat.com>
Date: Mon, 27 Oct 2025 12:13:58 +0100
From: Philipp Stanner <pstanner@...hat.com>
To: Matthew Brost <matthew.brost@...el.com>, intel-xe@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Cc: jiangshanlai@...il.com, tj@...nel.org, simona.vetter@...ll.ch,
christian.koenig@....com, dakr@...nel.org
Subject: Re: [RFC PATCH 3/3] drm/sched: Prevent adding dependencies to an
armed job
I've got a kernel.org addr by now by the way
On Tue, 2025-10-21 at 14:39 -0700, Matthew Brost wrote:
> According to the DMA scheduler documentation, once a job is armed, it
> must be pushed. Drivers should avoid calling the failing code path that
> attempts to add dependencies after a job has been armed.
>
Why is that a "failing code path"?
The issue with adding callbacks is that adding them to an already
signaled fence is a bad idea. I'm not sure if it's illegal, though.
dma_fence_add_cb() merely returns an error then, but the driver could
in priniciple then execute its cb code itself.
And even if we agree that this is a hard rule that must be followed,
then drm_sched_job_arm() *might* not be the right place, because just
because a job is armed doesn't mean that its fence is about to get
signaled. drm_sched_entity_push_job() would be the critical place.
> This change
> enforces that rule.
>
> Cc: Christian König <christian.koenig@....com>
> Cc: Danilo Krummrich <dakr@...nel.org>
> Cc: Matthew Brost <matthew.brost@...el.com>
> Cc: Philipp Stanner <phasta@...nel.org>
> Cc: dri-devel@...ts.freedesktop.org
> Signed-off-by: Matthew Brost <matthew.brost@...el.com>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 676484dd3ea3..436cb2844161 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -873,7 +873,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> * @job: scheduler job to add the dependencies to
> * @fence: the dma_fence to add to the list of dependencies.
> *
> - * Note that @fence is consumed in both the success and error cases.
> + * Note that @fence is consumed in both the success and error cases. This
> + * function cannot be called if the job is armed.
> *
> * Returns:
> * 0 on success, or an error on failing to expand the array.
> @@ -886,6 +887,10 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> u32 id = 0;
> int ret;
>
> + /* Do not allow additional dependencies when job is armed */
> + if (WARN_ON_ONCE(job->sched))
One would probably want an 'armed' boolean for that. At the very least
one wants to document in the struct's docstring that job->sched has
this semantic meaning. Otherwise it's only obvious for people who have
been hacking on the scheduler for years.
By the way I think that we use WARN_ON*() too much in DRM. It generates
difficult to read, non-descriptive error messages compared to
dev_warn() and similar helpers, and it's often a bit overkill. I would
only use it when there is no other choice, such as in an interrupt-
handler or widely used void func() where you cannot simply add a return
code.
P.
> + return -EINVAL;
> +
> if (!fence)
> return 0;
>
Powered by blists - more mailing lists