[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bbd7c5ee-c2f0-3e19-757d-a9aff1a26d3d@linux.intel.com>
Date: Wed, 8 Mar 2023 10:57:47 +0100
From: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>
To: Asahi Lina <lina@...hilina.net>,
Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>,
David Airlie <airlied@...il.com>,
Daniel Vetter <daniel@...ll.ch>,
Miguel Ojeda <ojeda@...nel.org>,
Alex Gaynor <alex.gaynor@...il.com>,
Wedson Almeida Filho <wedsonaf@...il.com>,
Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
Luben Tuikov <luben.tuikov@....com>,
Jarkko Sakkinen <jarkko@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>
Cc: Alyssa Rosenzweig <alyssa@...enzweig.io>,
Karol Herbst <kherbst@...hat.com>,
Ella Stanforth <ella@...unix.org>,
Faith Ekstrand <faith.ekstrand@...labora.com>,
Mary <mary@...y.zone>, linux-kernel@...r.kernel.org,
dri-devel@...ts.freedesktop.org, rust-for-linux@...r.kernel.org,
linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
linux-sgx@...r.kernel.org, asahi@...ts.linux.dev
Subject: Re: [PATCH RFC 11/18] drm/scheduler: Clean up jobs when the scheduler
is torn down
On 2023-03-07 15:25, Asahi Lina wrote:
> drm_sched_fini() currently leaves any pending jobs dangling, which
> causes segfaults and other badness when job completion fences are
> signaled after the scheduler is torn down.
>
> Explicitly detach all jobs from their completion callbacks and free
> them. This makes it possible to write a sensible safe abstraction for
> drm_sched, without having to externally duplicate the tracking of
> in-flight jobs.
>
> This shouldn't regress any existing drivers, since calling
> drm_sched_fini() with any pending jobs is broken and this change should
> be a no-op if there are no pending jobs.
>
> Signed-off-by: Asahi Lina <lina@...hilina.net>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 27 +++++++++++++++++++++++++--
> 1 file changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 5c0add2c7546..0aab1e0aebdd 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1119,10 +1119,33 @@ EXPORT_SYMBOL(drm_sched_init);
> void drm_sched_fini(struct drm_gpu_scheduler *sched)
> {
> struct drm_sched_entity *s_entity;
> + struct drm_sched_job *s_job, *tmp;
> int i;
>
> - if (sched->thread)
> - kthread_stop(sched->thread);
> + if (!sched->thread)
> + return;
> +
> + /*
> + * Stop the scheduler, detaching all jobs from their hardware callbacks
> + * and cleaning up complete jobs.
> + */
> + drm_sched_stop(sched, NULL);
> +
> + /*
> + * Iterate through the pending job list and free all jobs.
> + * This assumes the driver has either guaranteed jobs are already stopped, or that
> + * otherwise it is responsible for keeping any necessary data structures for
> + * in-progress jobs alive even when the free_job() callback is called early (e.g. by
> + * putting them in its own queue or doing its own refcounting).
> + */
> + list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
> + spin_lock(&sched->job_list_lock);
> + list_del_init(&s_job->list);
> + spin_unlock(&sched->job_list_lock);
> + sched->ops->free_job(s_job);
> + }
I would stop the kthread first, then delete all jobs without spinlock
since nothing else can race against sched_fini?
If you do need the spinlock, It would need to guard list_for_each_entry too.
> +
> + kthread_stop(sched->thread);
>
> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> struct drm_sched_rq *rq = &sched->sched_rq[i];
>
Powered by blists - more mailing lists