lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6a7159d5-fb5c-44b5-ba90-2dcd02b59097@ursulin.net>
Date: Thu, 6 Nov 2025 10:23:02 +0000
From: Tvrtko Ursulin <tursulin@...ulin.net>
To: Pierre-Eric Pelloux-Prayer <pierre-eric@...sy.net>,
 Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>,
 Alex Deucher <alexander.deucher@....com>,
 Christian König <christian.koenig@....com>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>
Cc: amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 3/4] drm/amdgpu: increment sched score on entity
 selection


On 06/11/2025 10:10, Pierre-Eric Pelloux-Prayer wrote:
> 
> 
> Le 06/11/2025 à 11:00, Tvrtko Ursulin a écrit :
>>
>> On 06/11/2025 09:39, Pierre-Eric Pelloux-Prayer wrote:
>>> For hw engines that can't load balance jobs, entities are
>>> "statically" load balanced: on their first submit, they select
>>> the best scheduler based on its score.
>>> The score is made up of 2 parts:
>>> * the job queue depth (how much jobs are executing/waiting)
>>> * the number of entities assigned
>>>
>>> The second part is only relevant for the static load balance:
>>> it's a way to consider how many entities are attached to this
>>> scheduler, knowing that if they ever submit jobs they will go
>>> to this one.
>>>
>>> For rings that can load balance jobs freely, idle entities
>>> aren't a concern and shouldn't impact the scheduler's decisions.
>>>
>>> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux- 
>>> prayer@....com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 21 ++++++++++++++++-----
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 +
>>>   2 files changed, 17 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/ 
>>> drm/amd/ amdgpu/amdgpu_ctx.c
>>> index afedea02188d..953c81c928c1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> @@ -209,6 +209,7 @@ static int amdgpu_ctx_init_entity(struct 
>>> amdgpu_ctx *ctx, u32 hw_ip,
>>>       struct amdgpu_ctx_entity *entity;
>>>       enum drm_sched_priority drm_prio;
>>>       unsigned int hw_prio, num_scheds;
>>> +    struct amdgpu_ring *aring;
>>>       int32_t ctx_prio;
>>>       int r;
>>> @@ -239,11 +240,13 @@ static int amdgpu_ctx_init_entity(struct 
>>> amdgpu_ctx *ctx, u32 hw_ip,
>>>               goto error_free_entity;
>>>       }
>>> -    /* disable load balance if the hw engine retains context among 
>>> dependent jobs */
>>> -    if (hw_ip == AMDGPU_HW_IP_VCN_ENC ||
>>> -        hw_ip == AMDGPU_HW_IP_VCN_DEC ||
>>> -        hw_ip == AMDGPU_HW_IP_UVD_ENC ||
>>> -        hw_ip == AMDGPU_HW_IP_UVD) {
>>> +    sched = scheds[0];
>>> +    aring = container_of(sched, struct amdgpu_ring, sched);
>>> +
>>> +    if (aring->funcs->engine_retains_context) {
>>> +        /* Disable load balancing between multiple schedulers if the hw
>>> +         * engine retains context among dependent jobs.
>>> +         */
>>>           sched = drm_sched_pick_best(scheds, num_scheds);
>>>           scheds = &sched;
>>>           num_scheds = 1;
>>> @@ -258,6 +261,11 @@ static int amdgpu_ctx_init_entity(struct 
>>> amdgpu_ctx *ctx, u32 hw_ip,
>>>       if (cmpxchg(&ctx->entities[hw_ip][ring], NULL, entity))
>>>           goto cleanup_entity;
>>> +    if (aring->funcs->engine_retains_context) {
>>> +        entity->sched_score = sched->score;
>>> +        atomic_inc(entity->sched_score);
>>
>> Maybe you missed it, in the last round I asked this:
> 
> I missed it, sorry.
> 
>>
>> """
>> Here is would always be sched->score == aring->sched_score, right?
> 
> Yes because drm_sched_init is called with args.score = ring->sched_score
> 
>>
>> If so it would probably be good to either add that assert, or even to 
>> just fetch it from there. Otherwise it can look potentially concerning 
>> to be fishing out the pointer from scheduler internals.
>>
>> The rest looks good to me.
>> """
>>
>> Because grabbing a pointer from drm_sched->score and storing it in AMD 
>> entity can look scary, since sched->score can be scheduler owned.
>>
>> Hence I was suggesting to either fish it out from aring->sched_score. 
>> If it is true that they are always the same atomic_t at this point.
> 
> I used sched->score, because aring->sched_score is not the one we want 
> (the existing aring points to scheds[0], not the selected sched). But I 
> can change the code to:
> 
> if (aring->funcs->engine_retains_context) {
>     aring = container_of(sched, struct amdgpu_ring, sched)
>     entity->sched_score = aring->sched_score;
>     atomic_inc(entity->sched_score);
> }
> 
> If it's preferred.

For me it is, yes. Because it removes any doubt on who owns the atomic_t 
pointed to. And it also isolates the driver from any changes in DRM 
scheduler structures.

With that:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@...lia.com>

Regards,

Tvrtko

>>> +    }
>>> +
>>>       return 0;
>>>   cleanup_entity:
>>> @@ -514,6 +522,9 @@ static void amdgpu_ctx_do_release(struct kref *ref)
>>>               if (!ctx->entities[i][j])
>>>                   continue;
>>> +            if (ctx->entities[i][j]->sched_score)
>>> +                atomic_dec(ctx->entities[i][j]->sched_score);
>>> +
>>>               drm_sched_entity_destroy(&ctx->entities[i][j]->entity);
>>>           }
>>>       }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/ 
>>> drm/amd/ amdgpu/amdgpu_ctx.h
>>> index 090dfe86f75b..f7b44f96f374 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>>> @@ -39,6 +39,7 @@ struct amdgpu_ctx_entity {
>>>       uint32_t        hw_ip;
>>>       uint64_t        sequence;
>>>       struct drm_sched_entity    entity;
>>> +    atomic_t        *sched_score;
>>>       struct dma_fence    *fences[];
>>>   };


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ