linux-kernel - Re: [PATCH v2] Revert "drm/nouveau: Remove waitque for sched teardown"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <727d723857f68d256f1050088673cad66626f47f.camel@mailbox.org>
Date: Thu, 09 Oct 2025 15:14:35 +0200
From: Philipp Stanner <phasta@...lbox.org>
To: Matthew Brost <matthew.brost@...el.com>, phasta@...nel.org
Cc: Lyude Paul <lyude@...hat.com>, Danilo Krummrich <dakr@...nel.org>, David
 Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>, Sumit Semwal
	 <sumit.semwal@...aro.org>, Christian König
	 <christian.koenig@....com>, dri-devel@...ts.freedesktop.org, 
	nouveau@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] Revert "drm/nouveau: Remove waitque for sched
 teardown"

On Wed, 2025-10-08 at 09:35 -0700, Matthew Brost wrote:
> On Wed, Oct 08, 2025 at 09:34:22AM +0200, Philipp Stanner wrote:
> > On Tue, 2025-10-07 at 10:15 -0700, Matthew Brost wrote:
> > > On Mon, Sep 01, 2025 at 10:31:08AM +0200, Philipp Stanner wrote:
> > > > This reverts:
> > > > 
> > > > commit bead88002227 ("drm/nouveau: Remove waitque for sched teardown")
> > > > commit 5f46f5c7af8c ("drm/nouveau: Add new callback for scheduler teardown")
> > > 
> > > I've been scanning some recent DRM scheduler changes.
> > > 
> > > I think we should likely revert:
> > > 
> > > bf8bbaefaa6a drm/sched: Avoid memory leaks with cancel_job() callback
> > > 
> > > 5f46f5c7af8c was the only user of cancel_job. I'm not sure why we'd
> > > carry dead code in DRM scheduler unless you have plans to make use of
> > > this function soon.
> > 
> > That will be added back to Nouveau soon. The reason it was removed from
> > Nouveau was not that cancel_job() is broken, but that removing the
> > waitqueue is not possible for other reasons.
> > 
> 
> Okay. In general, carrying dead code is less than ideal, but if this is
> going to be merged soon...

There is still the unit test testing the code, so it is not completely
dead.

In any case, I'll see to it.

And besides, I tend to think that this callback or an equivalent
mechanism should have been there from the beginning. IIRC Danilo back
then even asked on-list who will free pending jobs on sched teardown,
and the question was basically ignored and the code merged.

So if you want to help, you could implement cancel_job() for Xe :)

> 
> > Implementing cancel_job() has the canonical way of handling the
> > difficult life time issues and memory leaks associated with drm_sched
> > has been discussed literally for about 8-9 months on the lists.
> > 
> 
> Also, drm_sched_cancel_remaining_jobs appears to do something slightly
> concerning.
> 
> It signals DMA fences out of order by walking the pending list in
> reverse, which is generally not advisable. This behavior should probably
> be reviewed.

I'm perfectly happy with reversing the iterator.

> 
> Additionally, for jobs in the SPSC queue that are killed via
> drm_sched_entity_kill_jobs_work, we don’t call cancel_job.

All work items are stopped when cancel_job() gets invoked.

> 
> That might be intentional, but based on the cancel_job documentation,
> the job’s fence may not get signaled. Depending on the driver’s fence
> refcounting scheme (e.g., if it takes a reference to the job’s fence at
> arm), the scheduler-side reference may or may not be released too. We
> might want to investigate whether cancel_job should be invoked in that
> path as well.

Well, let's ask differently: when entity_kill_jobs_work() runs, who
does currently guarantee that the job's fence gets signaled? Because
that's what cancel_job() is fundamentally about: signal all fences
before freeing the associated jobs.


P.

> 
> Also is the entity is killed after the drm_sched_fini, the same problem
> with fencing signaling out-order mentioned above could occur too.
> 
> > If we can't get to a solution for a problem after 9 months of on-list
> > discussions, then we are lost.
> > 
> 
> Par for the course upstream. Apoligize for not paying more attention
> here.
> 
> Matt
>  
> > P.
> > 
> > > 
> > > Matt
> > > 
> > > > 
> > > > from the drm/sched teardown leak fix series:
> > > > 
> > > > https://lore.kernel.org/dri-devel/20250710125412.128476-2-phasta@kernel.org/
> > > > 
> > > > The aforementioned series removed a blocking waitqueue from
> > > > nouveau_sched_fini(). It was mistakenly assumed that this waitqueue only
> > > > prevents jobs from leaking, which the series fixed.
> > > > 
> > > > The waitqueue, however, also guarantees that all VM_BIND related jobs
> > > > are finished in order, cleaning up mappings in the GPU's MMU. These jobs
> > > > must be executed sequentially. Without the waitqueue, this is no longer
> > > > guaranteed, because entity and scheduler teardown can race with each
> > > > other.
> > > > 
> > > > Revert all patches related to the waitqueue removal.
> > > > 
> > > > Fixes: bead88002227 ("drm/nouveau: Remove waitque for sched teardown")
> > > > Suggested-by: Danilo Krummrich <dakr@...nel.org>
> > > > Signed-off-by: Philipp Stanner <phasta@...nel.org>
> > > > ---
> > > > Changes in v2:
> > > >   - Don't revert commit 89b2675198ab ("drm/nouveau: Make fence container helper usable driver-wide")
> > > >   - Add Fixes-tag
> > > > ---
> > > >  drivers/gpu/drm/nouveau/nouveau_fence.c | 15 -----------
> > > >  drivers/gpu/drm/nouveau/nouveau_fence.h |  1 -
> > > >  drivers/gpu/drm/nouveau/nouveau_sched.c | 35 ++++++++++---------------
> > > >  drivers/gpu/drm/nouveau/nouveau_sched.h |  9 ++++---
> > > >  drivers/gpu/drm/nouveau/nouveau_uvmm.c  |  8 +++---
> > > >  5 files changed, 24 insertions(+), 44 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > index 9f345a008717..869d4335c0f4 100644
> > > > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > @@ -240,21 +240,6 @@ nouveau_fence_emit(struct nouveau_fence *fence)
> > > >  	return ret;
> > > >  }
> > > >  
> > > > -void
> > > > -nouveau_fence_cancel(struct nouveau_fence *fence)
> > > > -{
> > > > -	struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
> > > > -	unsigned long flags;
> > > > -
> > > > -	spin_lock_irqsave(&fctx->lock, flags);
> > > > -	if (!dma_fence_is_signaled_locked(&fence->base)) {
> > > > -		dma_fence_set_error(&fence->base, -ECANCELED);
> > > > -		if (nouveau_fence_signal(fence))
> > > > -			nvif_event_block(&fctx->event);
> > > > -	}
> > > > -	spin_unlock_irqrestore(&fctx->lock, flags);
> > > > -}
> > > > -
> > > >  bool
> > > >  nouveau_fence_done(struct nouveau_fence *fence)
> > > >  {
> > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
> > > > index 9957a919bd38..183dd43ecfff 100644
> > > > --- a/drivers/gpu/drm/nouveau/nouveau_fence.h
> > > > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
> > > > @@ -29,7 +29,6 @@ void nouveau_fence_unref(struct nouveau_fence **);
> > > >  
> > > >  int  nouveau_fence_emit(struct nouveau_fence *);
> > > >  bool nouveau_fence_done(struct nouveau_fence *);
> > > > -void nouveau_fence_cancel(struct nouveau_fence *fence);
> > > >  int  nouveau_fence_wait(struct nouveau_fence *, bool lazy, bool intr);
> > > >  int  nouveau_fence_sync(struct nouveau_bo *, struct nouveau_channel *, bool exclusive, bool intr);
> > > >  
> > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
> > > > index 0cc0bc9f9952..e60f7892f5ce 100644
> > > > --- a/drivers/gpu/drm/nouveau/nouveau_sched.c
> > > > +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
> > > > @@ -11,7 +11,6 @@
> > > >  #include "nouveau_exec.h"
> > > >  #include "nouveau_abi16.h"
> > > >  #include "nouveau_sched.h"
> > > > -#include "nouveau_chan.h"
> > > >  
> > > >  #define NOUVEAU_SCHED_JOB_TIMEOUT_MS		10000
> > > >  
> > > > @@ -122,9 +121,11 @@ nouveau_job_done(struct nouveau_job *job)
> > > >  {
> > > >  	struct nouveau_sched *sched = job->sched;
> > > >  
> > > > -	spin_lock(&sched->job_list.lock);
> > > > +	spin_lock(&sched->job.list.lock);
> > > >  	list_del(&job->entry);
> > > > -	spin_unlock(&sched->job_list.lock);
> > > > +	spin_unlock(&sched->job.list.lock);
> > > > +
> > > > +	wake_up(&sched->job.wq);
> > > >  }
> > > >  
> > > >  void
> > > > @@ -305,9 +306,9 @@ nouveau_job_submit(struct nouveau_job *job)
> > > >  	}
> > > >  
> > > >  	/* Submit was successful; add the job to the schedulers job list. */
> > > > -	spin_lock(&sched->job_list.lock);
> > > > -	list_add(&job->entry, &sched->job_list.head);
> > > > -	spin_unlock(&sched->job_list.lock);
> > > > +	spin_lock(&sched->job.list.lock);
> > > > +	list_add(&job->entry, &sched->job.list.head);
> > > > +	spin_unlock(&sched->job.list.lock);
> > > >  
> > > >  	drm_sched_job_arm(&job->base);
> > > >  	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> > > > @@ -392,23 +393,10 @@ nouveau_sched_free_job(struct drm_sched_job *sched_job)
> > > >  	nouveau_job_fini(job);
> > > >  }
> > > >  
> > > > -static void
> > > > -nouveau_sched_cancel_job(struct drm_sched_job *sched_job)
> > > > -{
> > > > -	struct nouveau_fence *fence;
> > > > -	struct nouveau_job *job;
> > > > -
> > > > -	job = to_nouveau_job(sched_job);
> > > > -	fence = to_nouveau_fence(job->done_fence);
> > > > -
> > > > -	nouveau_fence_cancel(fence);
> > > > -}
> > > > -
> > > >  static const struct drm_sched_backend_ops nouveau_sched_ops = {
> > > >  	.run_job = nouveau_sched_run_job,
> > > >  	.timedout_job = nouveau_sched_timedout_job,
> > > >  	.free_job = nouveau_sched_free_job,
> > > > -	.cancel_job = nouveau_sched_cancel_job,
> > > >  };
> > > >  
> > > >  static int
> > > > @@ -458,8 +446,9 @@ nouveau_sched_init(struct nouveau_sched *sched, struct nouveau_drm *drm,
> > > >  		goto fail_sched;
> > > >  
> > > >  	mutex_init(&sched->mutex);
> > > > -	spin_lock_init(&sched->job_list.lock);
> > > > -	INIT_LIST_HEAD(&sched->job_list.head);
> > > > +	spin_lock_init(&sched->job.list.lock);
> > > > +	INIT_LIST_HEAD(&sched->job.list.head);
> > > > +	init_waitqueue_head(&sched->job.wq);
> > > >  
> > > >  	return 0;
> > > >  
> > > > @@ -493,12 +482,16 @@ nouveau_sched_create(struct nouveau_sched **psched, struct nouveau_drm *drm,
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +
> > > >  static void
> > > >  nouveau_sched_fini(struct nouveau_sched *sched)
> > > >  {
> > > >  	struct drm_gpu_scheduler *drm_sched = &sched->base;
> > > >  	struct drm_sched_entity *entity = &sched->entity;
> > > >  
> > > > +	rmb(); /* for list_empty to work without lock */
> > > > +	wait_event(sched->job.wq, list_empty(&sched->job.list.head));
> > > > +
> > > >  	drm_sched_entity_fini(entity);
> > > >  	drm_sched_fini(drm_sched);
> > > >  
> > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h b/drivers/gpu/drm/nouveau/nouveau_sched.h
> > > > index b98c3f0bef30..20cd1da8db73 100644
> > > > --- a/drivers/gpu/drm/nouveau/nouveau_sched.h
> > > > +++ b/drivers/gpu/drm/nouveau/nouveau_sched.h
> > > > @@ -103,9 +103,12 @@ struct nouveau_sched {
> > > >  	struct mutex mutex;
> > > >  
> > > >  	struct {
> > > > -		struct list_head head;
> > > > -		spinlock_t lock;
> > > > -	} job_list;
> > > > +		struct {
> > > > +			struct list_head head;
> > > > +			spinlock_t lock;
> > > > +		} list;
> > > > +		struct wait_queue_head wq;
> > > > +	} job;
> > > >  };
> > > >  
> > > >  int nouveau_sched_create(struct nouveau_sched **psched, struct nouveau_drm *drm,
> > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > index d94a85509176..79eefdfd08a2 100644
> > > > --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > @@ -1019,8 +1019,8 @@ bind_validate_map_sparse(struct nouveau_job *job, u64 addr, u64 range)
> > > >  	u64 end = addr + range;
> > > >  
> > > >  again:
> > > > -	spin_lock(&sched->job_list.lock);
> > > > -	list_for_each_entry(__job, &sched->job_list.head, entry) {
> > > > +	spin_lock(&sched->job.list.lock);
> > > > +	list_for_each_entry(__job, &sched->job.list.head, entry) {
> > > >  		struct nouveau_uvmm_bind_job *bind_job = to_uvmm_bind_job(__job);
> > > >  
> > > >  		list_for_each_op(op, &bind_job->ops) {
> > > > @@ -1030,7 +1030,7 @@ bind_validate_map_sparse(struct nouveau_job *job, u64 addr, u64 range)
> > > >  
> > > >  				if (!(end <= op_addr || addr >= op_end)) {
> > > >  					nouveau_uvmm_bind_job_get(bind_job);
> > > > -					spin_unlock(&sched->job_list.lock);
> > > > +					spin_unlock(&sched->job.list.lock);
> > > >  					wait_for_completion(&bind_job->complete);
> > > >  					nouveau_uvmm_bind_job_put(bind_job);
> > > >  					goto again;
> > > > @@ -1038,7 +1038,7 @@ bind_validate_map_sparse(struct nouveau_job *job, u64 addr, u64 range)
> > > >  			}
> > > >  		}
> > > >  	}
> > > > -	spin_unlock(&sched->job_list.lock);
> > > > +	spin_unlock(&sched->job.list.lock);
> > > >  }
> > > >  
> > > >  static int
> > > > -- 
> > > > 2.49.0
> > > > 
> >