[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6a7159d5-fb5c-44b5-ba90-2dcd02b59097@ursulin.net>
Date: Thu, 6 Nov 2025 10:23:02 +0000
From: Tvrtko Ursulin <tursulin@...ulin.net>
To: Pierre-Eric Pelloux-Prayer <pierre-eric@...sy.net>,
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>,
Alex Deucher <alexander.deucher@....com>,
Christian König <christian.koenig@....com>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>
Cc: amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 3/4] drm/amdgpu: increment sched score on entity
selection
On 06/11/2025 10:10, Pierre-Eric Pelloux-Prayer wrote:
>
>
> Le 06/11/2025 à 11:00, Tvrtko Ursulin a écrit :
>>
>> On 06/11/2025 09:39, Pierre-Eric Pelloux-Prayer wrote:
>>> For hw engines that can't load balance jobs, entities are
>>> "statically" load balanced: on their first submit, they select
>>> the best scheduler based on its score.
>>> The score is made up of 2 parts:
>>> * the job queue depth (how much jobs are executing/waiting)
>>> * the number of entities assigned
>>>
>>> The second part is only relevant for the static load balance:
>>> it's a way to consider how many entities are attached to this
>>> scheduler, knowing that if they ever submit jobs they will go
>>> to this one.
>>>
>>> For rings that can load balance jobs freely, idle entities
>>> aren't a concern and shouldn't impact the scheduler's decisions.
>>>
>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-
>>> prayer@....com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 21 ++++++++++++++++-----
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 1 +
>>> 2 files changed, 17 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/
>>> drm/amd/ amdgpu/amdgpu_ctx.c
>>> index afedea02188d..953c81c928c1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> @@ -209,6 +209,7 @@ static int amdgpu_ctx_init_entity(struct
>>> amdgpu_ctx *ctx, u32 hw_ip,
>>> struct amdgpu_ctx_entity *entity;
>>> enum drm_sched_priority drm_prio;
>>> unsigned int hw_prio, num_scheds;
>>> + struct amdgpu_ring *aring;
>>> int32_t ctx_prio;
>>> int r;
>>> @@ -239,11 +240,13 @@ static int amdgpu_ctx_init_entity(struct
>>> amdgpu_ctx *ctx, u32 hw_ip,
>>> goto error_free_entity;
>>> }
>>> - /* disable load balance if the hw engine retains context among
>>> dependent jobs */
>>> - if (hw_ip == AMDGPU_HW_IP_VCN_ENC ||
>>> - hw_ip == AMDGPU_HW_IP_VCN_DEC ||
>>> - hw_ip == AMDGPU_HW_IP_UVD_ENC ||
>>> - hw_ip == AMDGPU_HW_IP_UVD) {
>>> + sched = scheds[0];
>>> + aring = container_of(sched, struct amdgpu_ring, sched);
>>> +
>>> + if (aring->funcs->engine_retains_context) {
>>> + /* Disable load balancing between multiple schedulers if the hw
>>> + * engine retains context among dependent jobs.
>>> + */
>>> sched = drm_sched_pick_best(scheds, num_scheds);
>>> scheds = &sched;
>>> num_scheds = 1;
>>> @@ -258,6 +261,11 @@ static int amdgpu_ctx_init_entity(struct
>>> amdgpu_ctx *ctx, u32 hw_ip,
>>> if (cmpxchg(&ctx->entities[hw_ip][ring], NULL, entity))
>>> goto cleanup_entity;
>>> + if (aring->funcs->engine_retains_context) {
>>> + entity->sched_score = sched->score;
>>> + atomic_inc(entity->sched_score);
>>
>> Maybe you missed it, in the last round I asked this:
>
> I missed it, sorry.
>
>>
>> """
>> Here is would always be sched->score == aring->sched_score, right?
>
> Yes because drm_sched_init is called with args.score = ring->sched_score
>
>>
>> If so it would probably be good to either add that assert, or even to
>> just fetch it from there. Otherwise it can look potentially concerning
>> to be fishing out the pointer from scheduler internals.
>>
>> The rest looks good to me.
>> """
>>
>> Because grabbing a pointer from drm_sched->score and storing it in AMD
>> entity can look scary, since sched->score can be scheduler owned.
>>
>> Hence I was suggesting to either fish it out from aring->sched_score.
>> If it is true that they are always the same atomic_t at this point.
>
> I used sched->score, because aring->sched_score is not the one we want
> (the existing aring points to scheds[0], not the selected sched). But I
> can change the code to:
>
> if (aring->funcs->engine_retains_context) {
> aring = container_of(sched, struct amdgpu_ring, sched)
> entity->sched_score = aring->sched_score;
> atomic_inc(entity->sched_score);
> }
>
> If it's preferred.
For me it is, yes. Because it removes any doubt on who owns the atomic_t
pointed to. And it also isolates the driver from any changes in DRM
scheduler structures.
With that:
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@...lia.com>
Regards,
Tvrtko
>>> + }
>>> +
>>> return 0;
>>> cleanup_entity:
>>> @@ -514,6 +522,9 @@ static void amdgpu_ctx_do_release(struct kref *ref)
>>> if (!ctx->entities[i][j])
>>> continue;
>>> + if (ctx->entities[i][j]->sched_score)
>>> + atomic_dec(ctx->entities[i][j]->sched_score);
>>> +
>>> drm_sched_entity_destroy(&ctx->entities[i][j]->entity);
>>> }
>>> }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/
>>> drm/amd/ amdgpu/amdgpu_ctx.h
>>> index 090dfe86f75b..f7b44f96f374 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>>> @@ -39,6 +39,7 @@ struct amdgpu_ctx_entity {
>>> uint32_t hw_ip;
>>> uint64_t sequence;
>>> struct drm_sched_entity entity;
>>> + atomic_t *sched_score;
>>> struct dma_fence *fences[];
>>> };
Powered by blists - more mailing lists