linux-kernel - Re: [RFC PATCH 3/3] drm/sched: Prevent adding dependencies to an armed job

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aP+kJeOEghD+sEsg@lstrano-desk.jf.intel.com>
Date: Mon, 27 Oct 2025 09:56:05 -0700
From: Matthew Brost <matthew.brost@...el.com>
To: Philipp Stanner <pstanner@...hat.com>
CC: <intel-xe@...ts.freedesktop.org>, <dri-devel@...ts.freedesktop.org>,
	<linux-kernel@...r.kernel.org>, <jiangshanlai@...il.com>, <tj@...nel.org>,
	<simona.vetter@...ll.ch>, <christian.koenig@....com>, <dakr@...nel.org>
Subject: Re: [RFC PATCH 3/3] drm/sched: Prevent adding dependencies to an
 armed job

On Mon, Oct 27, 2025 at 12:13:58PM +0100, Philipp Stanner wrote:
> I've got a kernel.org addr by now by the way
> 
> On Tue, 2025-10-21 at 14:39 -0700, Matthew Brost wrote:
> > According to the DMA scheduler documentation, once a job is armed, it
> > must be pushed. Drivers should avoid calling the failing code path that
> > attempts to add dependencies after a job has been armed.
> > 
> 
> Why is that a "failing code path"?
> 

I noticed this after I sent - it should something like:

'avoid calling a possible failing code path, which allocates memory.'

I can make this a bit more clear.

> The issue with adding callbacks is that adding them to an already
> signaled fence is a bad idea. I'm not sure if it's illegal, though.
> dma_fence_add_cb() merely returns an error then, but the driver could
> in priniciple then execute its cb code itself.
> 
> And even if we agree that this is a hard rule that must be followed,
> then drm_sched_job_arm() *might* not be the right place, because just
> because a job is armed doesn't mean that its fence is about to get
> signaled. drm_sched_entity_push_job() would be the critical place.
>

I think this break our rule once arm is called, push must be called as
adding dependencies can possibly fail. This rule is called out in your
documentation patch too. I've seen 2 driver posted in the past year add
dependencies after arming, so I figured lets catch this misuse in the
scheduler.

Matt

> 
> >  This change
> > enforces that rule.
> > 
> > Cc: Christian König <christian.koenig@....com>
> > Cc: Danilo Krummrich <dakr@...nel.org>
> > Cc: Matthew Brost <matthew.brost@...el.com>
> > Cc: Philipp Stanner <phasta@...nel.org>
> > Cc: dri-devel@...ts.freedesktop.org
> > Signed-off-by: Matthew Brost <matthew.brost@...el.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 676484dd3ea3..436cb2844161 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -873,7 +873,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> >   * @job: scheduler job to add the dependencies to
> >   * @fence: the dma_fence to add to the list of dependencies.
> >   *
> > - * Note that @fence is consumed in both the success and error cases.
> > + * Note that @fence is consumed in both the success and error cases. This
> > + * function cannot be called if the job is armed.
> >   *
> >   * Returns:
> >   * 0 on success, or an error on failing to expand the array.
> > @@ -886,6 +887,10 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >  	u32 id = 0;
> >  	int ret;
> >  
> > +	/* Do not allow additional dependencies when job is armed */
> > +	if (WARN_ON_ONCE(job->sched))
> 
> One would probably want an 'armed' boolean for that. At the very least
> one wants to document in the struct's docstring that job->sched has
> this semantic meaning. Otherwise it's only obvious for people who have
> been hacking on the scheduler for years.
> 
> 
> By the way I think that we use WARN_ON*() too much in DRM. It generates
> difficult to read, non-descriptive error messages compared to
> dev_warn() and similar helpers, and it's often a bit overkill. I would
> only use it when there is no other choice, such as in an interrupt-
> handler or widely used void func() where you cannot simply add a return
> code.
> 
> 
> P.
> 
> > +		return -EINVAL;
> 
> 
> > +
> >  	if (!fence)
> >  		return 0;
> >  
>