linux-kernel - Re: [RFC PATCH 3/3] drm/sched: Prevent adding dependencies to an armed job

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <537bdebf2112a080ae92526ecfa41d63668d90a3.camel@redhat.com>
Date: Mon, 27 Oct 2025 12:13:58 +0100
From: Philipp Stanner <pstanner@...hat.com>
To: Matthew Brost <matthew.brost@...el.com>, intel-xe@...ts.freedesktop.org,
  dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Cc: jiangshanlai@...il.com, tj@...nel.org, simona.vetter@...ll.ch, 
	christian.koenig@....com, dakr@...nel.org
Subject: Re: [RFC PATCH 3/3] drm/sched: Prevent adding dependencies to an
 armed job

I've got a kernel.org addr by now by the way

On Tue, 2025-10-21 at 14:39 -0700, Matthew Brost wrote:
> According to the DMA scheduler documentation, once a job is armed, it
> must be pushed. Drivers should avoid calling the failing code path that
> attempts to add dependencies after a job has been armed.
> 

Why is that a "failing code path"?

The issue with adding callbacks is that adding them to an already
signaled fence is a bad idea. I'm not sure if it's illegal, though.
dma_fence_add_cb() merely returns an error then, but the driver could
in priniciple then execute its cb code itself.

And even if we agree that this is a hard rule that must be followed,
then drm_sched_job_arm() *might* not be the right place, because just
because a job is armed doesn't mean that its fence is about to get
signaled. drm_sched_entity_push_job() would be the critical place.

>  This change
> enforces that rule.
> 
> Cc: Christian König <christian.koenig@....com>
> Cc: Danilo Krummrich <dakr@...nel.org>
> Cc: Matthew Brost <matthew.brost@...el.com>
> Cc: Philipp Stanner <phasta@...nel.org>
> Cc: dri-devel@...ts.freedesktop.org
> Signed-off-by: Matthew Brost <matthew.brost@...el.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 676484dd3ea3..436cb2844161 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -873,7 +873,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
>   * @job: scheduler job to add the dependencies to
>   * @fence: the dma_fence to add to the list of dependencies.
>   *
> - * Note that @fence is consumed in both the success and error cases.
> + * Note that @fence is consumed in both the success and error cases. This
> + * function cannot be called if the job is armed.
>   *
>   * Returns:
>   * 0 on success, or an error on failing to expand the array.
> @@ -886,6 +887,10 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
>  	u32 id = 0;
>  	int ret;
>  
> +	/* Do not allow additional dependencies when job is armed */
> +	if (WARN_ON_ONCE(job->sched))

One would probably want an 'armed' boolean for that. At the very least
one wants to document in the struct's docstring that job->sched has
this semantic meaning. Otherwise it's only obvious for people who have
been hacking on the scheduler for years.

By the way I think that we use WARN_ON*() too much in DRM. It generates
difficult to read, non-descriptive error messages compared to
dev_warn() and similar helpers, and it's often a bit overkill. I would
only use it when there is no other choice, such as in an interrupt-
handler or widely used void func() where you cannot simply add a return
code.

P.

> +		return -EINVAL;

> +
>  	if (!fence)
>  		return 0;
>