lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aP+kJeOEghD+sEsg@lstrano-desk.jf.intel.com>
Date: Mon, 27 Oct 2025 09:56:05 -0700
From: Matthew Brost <matthew.brost@...el.com>
To: Philipp Stanner <pstanner@...hat.com>
CC: <intel-xe@...ts.freedesktop.org>, <dri-devel@...ts.freedesktop.org>,
	<linux-kernel@...r.kernel.org>, <jiangshanlai@...il.com>, <tj@...nel.org>,
	<simona.vetter@...ll.ch>, <christian.koenig@....com>, <dakr@...nel.org>
Subject: Re: [RFC PATCH 3/3] drm/sched: Prevent adding dependencies to an
 armed job

On Mon, Oct 27, 2025 at 12:13:58PM +0100, Philipp Stanner wrote:
> I've got a kernel.org addr by now by the way
> 
> On Tue, 2025-10-21 at 14:39 -0700, Matthew Brost wrote:
> > According to the DMA scheduler documentation, once a job is armed, it
> > must be pushed. Drivers should avoid calling the failing code path that
> > attempts to add dependencies after a job has been armed.
> > 
> 
> Why is that a "failing code path"?
> 

I noticed this after I sent - it should something like:

'avoid calling a possible failing code path, which allocates memory.'

I can make this a bit more clear.

> The issue with adding callbacks is that adding them to an already
> signaled fence is a bad idea. I'm not sure if it's illegal, though.
> dma_fence_add_cb() merely returns an error then, but the driver could
> in priniciple then execute its cb code itself.
> 
> And even if we agree that this is a hard rule that must be followed,
> then drm_sched_job_arm() *might* not be the right place, because just
> because a job is armed doesn't mean that its fence is about to get
> signaled. drm_sched_entity_push_job() would be the critical place.
>

I think this break our rule once arm is called, push must be called as
adding dependencies can possibly fail. This rule is called out in your
documentation patch too. I've seen 2 driver posted in the past year add
dependencies after arming, so I figured lets catch this misuse in the
scheduler.

Matt

> 
> >  This change
> > enforces that rule.
> > 
> > Cc: Christian König <christian.koenig@....com>
> > Cc: Danilo Krummrich <dakr@...nel.org>
> > Cc: Matthew Brost <matthew.brost@...el.com>
> > Cc: Philipp Stanner <phasta@...nel.org>
> > Cc: dri-devel@...ts.freedesktop.org
> > Signed-off-by: Matthew Brost <matthew.brost@...el.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 676484dd3ea3..436cb2844161 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -873,7 +873,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> >   * @job: scheduler job to add the dependencies to
> >   * @fence: the dma_fence to add to the list of dependencies.
> >   *
> > - * Note that @fence is consumed in both the success and error cases.
> > + * Note that @fence is consumed in both the success and error cases. This
> > + * function cannot be called if the job is armed.
> >   *
> >   * Returns:
> >   * 0 on success, or an error on failing to expand the array.
> > @@ -886,6 +887,10 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >  	u32 id = 0;
> >  	int ret;
> >  
> > +	/* Do not allow additional dependencies when job is armed */
> > +	if (WARN_ON_ONCE(job->sched))
> 
> One would probably want an 'armed' boolean for that. At the very least
> one wants to document in the struct's docstring that job->sched has
> this semantic meaning. Otherwise it's only obvious for people who have
> been hacking on the scheduler for years.
> 
> 
> By the way I think that we use WARN_ON*() too much in DRM. It generates
> difficult to read, non-descriptive error messages compared to
> dev_warn() and similar helpers, and it's often a bit overkill. I would
> only use it when there is no other choice, such as in an interrupt-
> handler or widely used void func() where you cannot simply add a return
> code.
> 
> 
> P.
> 
> > +		return -EINVAL;
> 
> 
> > +
> >  	if (!fence)
> >  		return 0;
> >  
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ