linux-kernel - Re: [PATCH v2 2/6] drm/sched: Prevent teardown waitque from blocking too long

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e627335ea7d0cbb1f8b92ad5fd936466b19c3ec7.camel@mailbox.org>
Date: Fri, 16 May 2025 11:54:15 +0200
From: Philipp Stanner <phasta@...lbox.org>
To: Tvrtko Ursulin <tvrtko.ursulin@...lia.com>, Philipp Stanner
 <phasta@...nel.org>, Lyude Paul <lyude@...hat.com>, Danilo Krummrich
 <dakr@...nel.org>, David Airlie <airlied@...il.com>, Simona Vetter
 <simona@...ll.ch>, Matthew Brost <matthew.brost@...el.com>, Christian
 König <ckoenig.leichtzumerken@...il.com>, Maarten
 Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard
 <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>
Cc: dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org, 
	linux-kernel@...r.kernel.org, Danilo Krummrich <dakr@...hat.com>
Subject: Re: [PATCH v2 2/6] drm/sched: Prevent teardown waitque from
 blocking too long

On Fri, 2025-05-16 at 10:33 +0100, Tvrtko Ursulin wrote:
> 
> On 24/04/2025 10:55, Philipp Stanner wrote:
> > The waitqueue that ensures that drm_sched_fini() blocks until the
> > pending_list has become empty could theoretically cause that
> > function to
> > block for a very long time. That, ultimately, could block userspace
> > procesess and prevent them from being killable through SIGKILL.
> > 
> > When a driver calls drm_sched_fini(), it is safe to assume that all
> > still pending jobs are not needed anymore anyways. Thus, they can
> > be
> > cancelled and thereby it can be ensured that drm_sched_fini() will
> > return relatively quickly.
> > 
> > Implement a new helper to stop all work items / submission except
> > for
> > the drm_sched_backend_ops.run_job().
> > 
> > Implement a driver callback, kill_fence_context(), that instructs
> > the
> > driver to kill the fence context associated with this scheduler,
> > thereby
> > causing all pending hardware fences to be signalled.
> > 
> > Call those new routines in drm_sched_fini() and ensure backwards
> > compatibility if the new callback is not implemented.
> > 
> > Suggested-by: Danilo Krummrich <dakr@...hat.com>
> > Signed-off-by: Philipp Stanner <phasta@...nel.org>
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 47 +++++++++++++++++----
> > -----
> >   include/drm/gpu_scheduler.h            | 11 ++++++
> >   2 files changed, 42 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index ea82e69a72a8..c2ad6c70bfb6 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1390,31 +1390,46 @@ drm_sched_no_jobs_pending(struct
> > drm_gpu_scheduler *sched)
> >   	return empty;
> >   }
> >   
> > +/**
> > + * drm_sched_cancel_jobs_and_wait - trigger freeing of all pending
> > jobs
> > + * @sched: scheduler instance
> > + *
> > + * Must only be called if &struct
> > drm_sched_backend_ops.kill_fence_context is
> > + * implemented.
> > + *
> > + * Instructs the driver to kill the fence context associated with
> > this scheduler,
> > + * thereby signalling all pending fences. This, in turn, will
> > trigger
> > + * &struct drm_sched_backend_ops.free_job to be called for all
> > pending jobs.
> > + * The function then blocks until all pending jobs have been
> > freed.
> > + */
> > +static inline void
> > +drm_sched_cancel_jobs_and_wait(struct drm_gpu_scheduler *sched)
> > +{
> > +	sched->ops->kill_fence_context(sched);
> > +	wait_event(sched->pending_list_waitque,
> > drm_sched_no_jobs_pending(sched));
> > +}
> > +
> >   /**
> >    * drm_sched_fini - Destroy a gpu scheduler
> >    *
> >    * @sched: scheduler instance
> >    *
> > - * Tears down and cleans up the scheduler.
> > - *
> > - * Note that this function blocks until all the fences returned by
> > - * &struct drm_sched_backend_ops.run_job have been signalled.
> > + * Tears down and cleans up the scheduler. Might leak memory if
> > + * &struct drm_sched_backend_ops.kill_fence_context is not
> > implemented.
> >    */
> >   void drm_sched_fini(struct drm_gpu_scheduler *sched)
> >   {
> >   	struct drm_sched_entity *s_entity;
> >   	int i;
> >   
> > -	/*
> > -	 * Jobs that have neither been scheduled or which have
> > timed out are
> > -	 * gone by now, but jobs that have been submitted through
> > -	 * backend_ops.run_job() and have not yet terminated are
> > still pending.
> > -	 *
> > -	 * Wait for the pending_list to become empty to avoid
> > leaking those jobs.
> > -	 */
> > -	drm_sched_submission_and_timeout_stop(sched);
> > -	wait_event(sched->pending_list_waitque,
> > drm_sched_no_jobs_pending(sched));
> > -	drm_sched_free_stop(sched);
> > +	if (sched->ops->kill_fence_context) {
> > +		drm_sched_submission_and_timeout_stop(sched);
> > +		drm_sched_cancel_jobs_and_wait(sched);
> > +		drm_sched_free_stop(sched);
> > +	} else {
> > +		/* We're in "legacy free-mode" and ignore
> > potential mem leaks */
> > +		drm_sched_wqueue_stop(sched);
> > +	}
> >   
> >   	for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs;
> > i++) {
> >   		struct drm_sched_rq *rq = sched->sched_rq[i];
> > @@ -1502,7 +1517,7 @@ bool drm_sched_wqueue_ready(struct
> > drm_gpu_scheduler *sched)
> >   EXPORT_SYMBOL(drm_sched_wqueue_ready);
> >   
> >   /**
> > - * drm_sched_wqueue_stop - stop scheduler submission
> > + * drm_sched_wqueue_stop - stop scheduler submission and freeing
> 
> Looks like the kerneldoc corrections (below too) belong to the
> previous 
> patch. Irrelevant if you decide to squash them though.
> 
> >    * @sched: scheduler instance
> >    *
> >    * Stops the scheduler from pulling new jobs from entities. It
> > also stops
> > @@ -1518,7 +1533,7 @@ void drm_sched_wqueue_stop(struct
> > drm_gpu_scheduler *sched)
> >   EXPORT_SYMBOL(drm_sched_wqueue_stop);
> >   
> >   /**
> > - * drm_sched_wqueue_start - start scheduler submission
> > + * drm_sched_wqueue_start - start scheduler submission and freeing
> >    * @sched: scheduler instance
> >    *
> >    * Restarts the scheduler after drm_sched_wqueue_stop() has
> > stopped it.
> > diff --git a/include/drm/gpu_scheduler.h
> > b/include/drm/gpu_scheduler.h
> > index d0b1f416b4d9..8630b4a26f10 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -509,6 +509,17 @@ struct drm_sched_backend_ops {
> >            * and it's time to clean it up.
> >   	 */
> >   	void (*free_job)(struct drm_sched_job *sched_job);
> > +
> > +	/**
> > +	 * @kill_fence_context: kill the fence context belonging
> > to this scheduler
> 
> Which fence context would that be? ;)
> 
> Also, "fence context" would be a new terminology in gpu_scheduler.h
> API 
> level. You could call it ->sched_fini() or similar to signify at
> which 
> point in the API it gets called and then the fact it takes sched as 
> parameter would be natural.
> 
> We also probably want some commentary on the topic of indefinite (or 
> very long at least) blocking a thread exit / SIGINT/TERM/KILL time. 
> Cover letter touches upon that problem but I don't see you address
> it. 
> Is the idea to let drivers shoot themselves in the foot or what?

You didn't seriously just write that, did you?

Yes, my intention clearly has been since September to make things
worse, more ambiguous and destroy drivers. That's why I'm probably the
only individual after Sima (who added some of the FIXMEs) who ever
bothered documenting all this mess here, and warning people about the
five hundred pitfalls, like three different start and stop functions,
implicit refcount rules and God knows which other insane hacks that
simply move driver problems into common infrastructure.

Reconsider your attitude.

P.

> 
> Regards,
> 
> Tvrtko
> 
> > +	 *
> > +	 * Needed to cleanly tear the scheduler down in
> > drm_sched_fini(). This
> > +	 * callback will cause all hardware fences to be signalled
> > by the driver,
> > +	 * which, ultimately, ensures that all jobs get freed
> > before teardown.
> > +	 *
> > +	 * This callback is optional, but it is highly recommended
> > to implement it.
> > +	 */
> > +	void (*kill_fence_context)(struct drm_gpu_scheduler
> > *sched);
> >   };
> >   
> >   /**
>