[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f5a7ceb4-2313-48b9-bf75-628e9ad91757@ursulin.net>
Date: Fri, 26 Sep 2025 09:24:08 +0100
From: Tvrtko Ursulin <tursulin@...ulin.net>
To: Pierre-Eric Pelloux-Prayer <pierre-eric@...sy.net>,
Tomeu Vizoso <tomeu@...euvizoso.net>, Oded Gabbay <ogabbay@...nel.org>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
Simona Vetter <simona@...ll.ch>,
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>,
Danilo Krummrich <dakr@...nel.org>,
Christian König <ckoenig.leichtzumerken@...il.com>,
Maxime Ripard <mripard@...nel.org>, David Airlie <airlied@...il.com>,
Matthew Brost <matthew.brost@...el.com>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Thomas Zimmermann <tzimmermann@...e.de>, Philipp Stanner <phasta@...nel.org>
Subject: Re: [PATCH v1 2/2] drm/sched: limit sched score update to jobs change
On 26/09/2025 09:20, Pierre-Eric Pelloux-Prayer wrote:
> Hi,
>
> Le 01/09/2025 à 11:20, Tvrtko Ursulin a écrit :
>>
>>
>> + Tomeu and Oded
>>
>> On 22/08/2025 14:43, Pierre-Eric Pelloux-Prayer wrote:
>>> Currently, the scheduler score is incremented when a job is pushed to an
>>> entity and when an entity is attached to the scheduler.
>>>
>>> This leads to some bad scheduling decision where the score value is
>>> largely made of idle entities.
>>>
>>> For instance, a scenario with 2 schedulers and where 10 entities submit
>>> a single job, then do nothing, each scheduler will probably end up with
>>> a score of 5.
>>> Now, 5 userspace apps exit, so their entities will be dropped. In
>>> the worst case, these apps' entities where all attached to the same
>>> scheduler and we end up with score=5 (the 5 remaining entities) and
>>> score=0, despite the 2 schedulers being idle.
>>> When new entities show up, they will all select the second scheduler
>>> based on its low score value, instead of alternating between the 2.
>>>
>>> Some amdgpu rings depended on this feature, but the previous commit
>>> implemented the same thing in amdgpu directly so it can be safely
>>> removed from drm/sched.
>>>
>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-
>>> prayer@....com>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_main.c | 2 --
>>> 1 file changed, 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/
>>> drm/ scheduler/sched_main.c
>>> index 5a550fd76bf0..e6d232a8ec58 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -206,7 +206,6 @@ void drm_sched_rq_add_entity(struct drm_sched_rq
>>> *rq,
>>> if (!list_empty(&entity->list))
>>> return;
>>> - atomic_inc(rq->sched->score);
>>> list_add_tail(&entity->list, &rq->entities);
>>> }
>>> @@ -228,7 +227,6 @@ void drm_sched_rq_remove_entity(struct
>>> drm_sched_rq *rq,
>>> spin_lock(&rq->lock);
>>> - atomic_dec(rq->sched->score);
>>> list_del_init(&entity->list);
>>> if (rq->current_entity == entity)
>>
>> LGTM.
>>
>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@...lia.com>
>>
>
> Thanks.
>
>> Only detail is, I did a revisit of the scheduler users and it looks
>> like the new rocket driver is the only one other than amdgpu which
>> passes a list of more than one scheduler to drm_sched_entity_init. I
>> don't *think* it would be affected though. It would still pick the
>> least loaded (based on active jobs) scheduler at job submit time.
>> Unless there is some hidden behaviour in that driver where it would be
>> important to consider number of entities too. Anyway, it would be good
>> for rocket driver to double-check and ack.
>>
>
> Tomeu, Oded: any concerns about this change?
Tomeu acked it in case you missed it:
https://lore.kernel.org/dri-devel/20250822134348.6819-1-pierre-eric.pelloux-prayer@amd.com/T/#m4efa0d69211b2bb378cfb302e459faa30e57f58c
Regards,
Tvrtko
Powered by blists - more mailing lists