[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <abb776bc-5b13-4db7-9421-59259119b859@ursulin.net>
Date: Mon, 1 Sep 2025 10:20:48 +0100
From: Tvrtko Ursulin <tursulin@...ulin.net>
To: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>,
Matthew Brost <matthew.brost@...el.com>, Danilo Krummrich <dakr@...nel.org>,
Philipp Stanner <phasta@...nel.org>,
Christian König <ckoenig.leichtzumerken@...il.com>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Tomeu Vizoso <tomeu@...euvizoso.net>, Oded Gabbay <ogabbay@...nel.org>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 2/2] drm/sched: limit sched score update to jobs change
+ Tomeu and Oded
On 22/08/2025 14:43, Pierre-Eric Pelloux-Prayer wrote:
> Currently, the scheduler score is incremented when a job is pushed to an
> entity and when an entity is attached to the scheduler.
>
> This leads to some bad scheduling decision where the score value is
> largely made of idle entities.
>
> For instance, a scenario with 2 schedulers and where 10 entities submit
> a single job, then do nothing, each scheduler will probably end up with
> a score of 5.
> Now, 5 userspace apps exit, so their entities will be dropped. In
> the worst case, these apps' entities where all attached to the same
> scheduler and we end up with score=5 (the 5 remaining entities) and
> score=0, despite the 2 schedulers being idle.
> When new entities show up, they will all select the second scheduler
> based on its low score value, instead of alternating between the 2.
>
> Some amdgpu rings depended on this feature, but the previous commit
> implemented the same thing in amdgpu directly so it can be safely
> removed from drm/sched.
>
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 5a550fd76bf0..e6d232a8ec58 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -206,7 +206,6 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
> if (!list_empty(&entity->list))
> return;
>
> - atomic_inc(rq->sched->score);
> list_add_tail(&entity->list, &rq->entities);
> }
>
> @@ -228,7 +227,6 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>
> spin_lock(&rq->lock);
>
> - atomic_dec(rq->sched->score);
> list_del_init(&entity->list);
>
> if (rq->current_entity == entity)
LGTM.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@...lia.com>
Only detail is, I did a revisit of the scheduler users and it looks like
the new rocket driver is the only one other than amdgpu which passes a
list of more than one scheduler to drm_sched_entity_init. I don't
*think* it would be affected though. It would still pick the least
loaded (based on active jobs) scheduler at job submit time. Unless there
is some hidden behaviour in that driver where it would be important to
consider number of entities too. Anyway, it would be good for rocket
driver to double-check and ack.
Regards,
Tvrtko
Powered by blists - more mailing lists