[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z73LVBw7HXANVbHQ@cassiopeiae>
Date: Tue, 25 Feb 2025 14:53:24 +0100
From: Danilo Krummrich <dakr@...nel.org>
To: Philipp Stanner <phasta@...nel.org>
Cc: Matthew Brost <matthew.brost@...el.com>,
Christian König <ckoenig.leichtzumerken@...il.com>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Tvrtko Ursulin <tvrtko.ursulin@...lia.com>,
dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] drm/sched: Fix outdated comments referencing thread
On Tue, Feb 25, 2025 at 02:13:32PM +0100, Philipp Stanner wrote:
> The GPU scheduler's comments refer to a "thread" at various places.
> Those are leftovers stemming from a rework in which the scheduler was
Maybe "leftovers from commit a6149f039369 ("drm/sched: Convert drm scheduler to
use a work queue rather than kthread") [...]".
> ported from using a kthread to workqueues.
>
> Replace all references to kthreads.
>
> Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a work queue rather than kthread")
I suggest to drop the 'Fixes' tag, it's not a fix in the sense of this tag.
> Signed-off-by: Philipp Stanner <phasta@...nel.org>
> ---
> drivers/gpu/drm/scheduler/sched_entity.c | 8 ++++----
> drivers/gpu/drm/scheduler/sched_main.c | 21 +++++++++++----------
> include/drm/gpu_scheduler.h | 12 ++++++------
> 3 files changed, 21 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 87f88259ddf6..f9811420c787 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -538,10 +538,10 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> return;
>
> /*
> - * Only when the queue is empty are we guaranteed that the scheduler
> - * thread cannot change ->last_scheduled. To enforce ordering we need
> - * a read barrier here. See drm_sched_entity_pop_job() for the other
> - * side.
> + * Only when the queue is empty are we guaranteed that
> + * drm_sched_run_job_work() cannot change entity->last_scheduled. To
> + * enforce ordering we need a read barrier here. See
> + * drm_sched_entity_pop_job() for the other side.
> */
> smp_rmb();
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index c634993f1346..015ee327fe52 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -389,7 +389,7 @@ static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
> * drm_sched_job_done - complete a job
> * @s_job: pointer to the job which is done
> *
> - * Finish the job's fence and wake up the worker thread.
> + * Finish the job's fence and wake up the work item.
> */
> static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
> {
> @@ -550,8 +550,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
> if (job) {
> /*
> * Remove the bad job so it cannot be freed by concurrent
> - * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
Not related to your patch, but I assume this means "cannot be freed by
concurrent calls to drm_sched_job_cleanup()", which would still be incorrect,
since drm_sched_job_cleanup() doesn't free the job itself. Maybe you want to fix
this as well?
> - * is parked at which point it's safe.
> + * drm_sched_cleanup_jobs. It will be reinserted back after the
> + * scheduler's workqueues are stopped at which point it's safe.
You don't know whether the workqueues are "stopped". I think you want to say
that run_job / free_job work isn't scheduled or running.
Same for a couple more instances below.
> */
> list_del_init(&job->list);
> spin_unlock(&sched->job_list_lock);
> @@ -597,10 +597,10 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>
> /*
> * Reinsert back the bad job here - now it's safe as
> - * drm_sched_get_finished_job cannot race against us and release the
> + * drm_sched_get_finished_job() cannot race against us and release the
Not related to the patch, but fine for me, since you do it for consistency with
the change below.
> * bad job at this point - we parked (waited for) any in progress
> - * (earlier) cleanups and drm_sched_get_finished_job will not be called
> - * now until the scheduler thread is unparked.
> + * (earlier) cleanups and drm_sched_get_finished_job() will not be
> + * called now until the scheduler's workqueues are unparked.
workqueues are unparked?
> */
> if (bad && bad->sched == sched)
> /*
> @@ -613,7 +613,8 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
> * Iterate the job list from later to earlier one and either deactive
> * their HW callbacks or remove them from pending list if they already
> * signaled.
> - * This iteration is thread safe as sched thread is stopped.
> + * This iteration is thread safe as the scheduler's workqueues are
> + * stopped.
> */
> list_for_each_entry_safe_reverse(s_job, tmp, &sched->pending_list,
> list) {
> @@ -678,9 +679,9 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, int errno)
> struct drm_sched_job *s_job, *tmp;
>
> /*
> - * Locking the list is not required here as the sched thread is parked
> - * so no new jobs are being inserted or removed. Also concurrent
> - * GPU recovers can't run in parallel.
> + * Locking the list is not required here as the scheduler's workqueues
> + * are paused, so no new jobs are being inserted or removed. Also
> + * concurrent GPU recovers can't run in parallel.
> */
> list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
> struct dma_fence *fence = s_job->s_fence->parent;
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 50928a7ae98e..7da7b0b52a7e 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -192,8 +192,8 @@ struct drm_sched_entity {
> * @last_scheduled:
> *
> * Points to the finished fence of the last scheduled job. Only written
> - * by the scheduler thread, can be accessed locklessly from
> - * drm_sched_job_arm() if the queue is empty.
> + * by &struct drm_gpu_scheduler.submit_wq. Can be accessed locklessly
> + * from drm_sched_job_arm() if the queue is empty.
> */
> struct dma_fence __rcu *last_scheduled;
>
> @@ -426,14 +426,14 @@ struct drm_sched_backend_ops {
> * Drivers typically issue a reset to recover from GPU hangs, and this
> * procedure usually follows the following workflow:
> *
> - * 1. Stop the scheduler using drm_sched_stop(). This will park the
> - * scheduler thread and cancel the timeout work, guaranteeing that
> - * nothing is queued while we reset the hardware queue
> + * 1. Stop the scheduler using drm_sched_stop(). This will stop the
> + * scheduler's workqueues and cancel the timeout work, guaranteeing
> + * that nothing is queued while we reset the hardware queue
> * 2. Try to gracefully stop non-faulty jobs (optional)
> * 3. Issue a GPU reset (driver-specific)
> * 4. Re-submit jobs using drm_sched_resubmit_jobs()
> * 5. Restart the scheduler using drm_sched_start(). At that point, new
> - * jobs can be queued, and the scheduler thread is unblocked
> + * jobs can be queued, and the scheduler's workqueues be started again.
> *
> * Note that some GPUs have distinct hardware queues but need to reset
> * the GPU globally, which requires extra synchronization between the
> --
> 2.48.1
>
Powered by blists - more mailing lists