[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30fadc96-d531-4cde-a717-c5983908ea04@gmail.com>
Date: Fri, 13 Dec 2024 18:10:27 +0100
From: Antonino Maniscalco <antomani103@...il.com>
To: Akhil P Oommen <quic_akhilpo@...cinc.com>, Rob Clark
<robdclark@...il.com>, dri-devel@...ts.freedesktop.org
Cc: linux-arm-msm@...r.kernel.org, freedreno@...ts.freedesktop.org,
Rob Clark <robdclark@...omium.org>, Sean Paul <sean@...rly.run>,
Konrad Dybcio <konradybcio@...nel.org>,
Abhinav Kumar <quic_abhinavk@...cinc.com>,
Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
Marijn Suijten <marijn.suijten@...ainline.org>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] drm/msm: Add UABI to request perfcntr usage
On 12/13/24 5:50 PM, Akhil P Oommen wrote:
> On 12/12/2024 9:44 PM, Antonino Maniscalco wrote:
>> On 12/12/24 4:58 PM, Akhil P Oommen wrote:
>>> On 12/5/2024 10:24 PM, Rob Clark wrote:
>>>> From: Rob Clark <robdclark@...omium.org>
>>>>
>>>> Performance counter usage falls into two categories:
>>>>
>>>> 1. Local usage, where the counter configuration, start, and end read
>>>> happen within (locally to) a single SUBMIT. In this case, there is
>>>> no dependency on counter configuration or values between submits,
>>>> and
>>>> in fact counters are normally cleared on context switches, making it
>>>> impossible to rely on cross-submit state.
>>>>
>>>> 2. Global usage, where a single privilaged daemon/process is sampling
>>>> counter values across all processes for profiling.
>>>>
>>>> The two categories are mutually exclusive. While you can have many
>>>> processes making local counter usage, you cannot combine global and
>>>> local usage without the two stepping on each others feet (by changing
>>>> counter configuration).
>>>>
>>>> For global counter usage, there is already a SYSPROF param (since global
>>>> counter usage requires disabling counter clearing on context switch).
>>>> This patch adds a REQ_CNTRS param to request local counter usage. If
>>>> one or more processes has requested counter usage, then a SYSPROF
>>>> request will fail with -EBUSY. And if SYSPROF is active, then REQ_CNTRS
>>>> will fail with -EBUSY, maintaining the mutual exclusivity.
>>>>
>>>> This is purely an advisory interface to help coordinate userspace.
>>>> There is no real means of enforcement, but the worst that can happen if
>>>> userspace ignores a REQ_CNTRS failure is that you'll get nonsense
>>>> profiling data.
>>>>
>>>> Signed-off-by: Rob Clark <robdclark@...omium.org>
>>>> ---
>>>> kgsl takes a different approach, which involves a lot more UABI for
>>>> assigning counters to different processes. But I think by taking
>>>> advantage of the fact that mesa (freedreno+turnip) reconfigure the
>>>> counters they need in each SUBMIT, for their respective gl/vk perf-
>>>> counter extensions, we can take this simpler approach.
>>>
>>> KGSL's approach is preemption and ifpc safe (also whatever HW changes
>>> that will come up in future generations). How will we ensure that here?
>>>
>>> I have plans to bring up IFPC support in near future. Also, I brought up
>>> this point during preemption series. But from the responses, I felt that
>>> profiling was not considered a serious usecase. Still I wonder how the
>>> perfcounter extensions work accurately with preemption.
>>
>> So back then I implemented the postamble IB to clear perf counters and
>> that gets disabled when sysprof (so global usage) is happening. The
>> kernel is oblivious to "Local isage" of profiling but in that case
>> really what we want to do is disable preemption which in my
>> understanding can be done from userspace with a PKT. In my understanding
>> this had us covered for all usecases.
>
> I think this wasn't mentioned at that time. Which UMD PKT in a6x+ did
> you mean?
Ah, I thought it wasmentioned, sorry.
The packet I was referring to is:
<doc> Make next dword 1 to disable preemption, 0 to re-enable it. </doc>
<value name="CP_PREEMPT_DISABLE" value="0x6c" variants="A6XX"/>
BTW you mentioned wanting to look into IFPC. Since I too wanted to look
into implementing it wonder if you could let me know when you planned on
working on it.
>
> -Akhil.
>
>>
>> So what would you expect instead we should do kernel side to make
>> profiling preemption safe?
>>
>>>
>>> -Akhil
>>>
>>>>
>>>> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +
>>>> drivers/gpu/drm/msm/msm_drv.c | 5 ++-
>>>> drivers/gpu/drm/msm/msm_gpu.c | 1 +
>>>> drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++-
>>>> drivers/gpu/drm/msm/msm_submitqueue.c | 52 ++++++++++++++++++++++++-
>>>> include/uapi/drm/msm_drm.h | 1 +
>>>> 6 files changed, 85 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/
>>>> drm/msm/adreno/adreno_gpu.c
>>>> index 31bbf2c83de4..f688e37059b8 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>> @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct
>>>> msm_file_private *ctx,
>>>> if (!capable(CAP_SYS_ADMIN))
>>>> return UERR(EPERM, drm, "invalid permissions");
>>>> return msm_file_private_set_sysprof(ctx, gpu, value);
>>>> + case MSM_PARAM_REQ_CNTRS:
>>>> + return msm_file_private_request_counters(ctx, gpu, value);
>>>> default:
>>>> return UERR(EINVAL, drm, "%s: invalid param: %u", gpu-
>>>>> name, param);
>>>> }
>>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/
>>>> msm_drv.c
>>>> index 6416d2cb4efc..bf8314ff4a25 100644
>>>> --- a/drivers/gpu/drm/msm/msm_drv.c
>>>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>>>> @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device
>>>> *dev, struct drm_file *file)
>>>> * It is not possible to set sysprof param to non-zero if gpu
>>>> * is not initialized:
>>>> */
>>>> - if (priv->gpu)
>>>> + if (ctx->sysprof)
>>>> msm_file_private_set_sysprof(ctx, priv->gpu, 0);
>>>> + if (ctx->counters_requested)
>>>> + msm_file_private_request_counters(ctx, priv->gpu, 0);
>>>> +
>>>> context_close(ctx);
>>>> }
>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/
>>>> msm_gpu.c
>>>> index 82f204f3bb8f..013b59ca3bb1 100644
>>>> --- a/drivers/gpu/drm/msm/msm_gpu.c
>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
>>>> @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct
>>>> platform_device *pdev,
>>>> gpu->nr_rings = nr_rings;
>>>> refcount_set(&gpu->sysprof_active, 1);
>>>> + refcount_set(&gpu->local_counters_active, 1);
>>>> return 0;
>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/
>>>> msm_gpu.h
>>>> index e25009150579..83c61e523b1b 100644
>>>> --- a/drivers/gpu/drm/msm/msm_gpu.h
>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h
>>>> @@ -195,12 +195,28 @@ struct msm_gpu {
>>>> int nr_rings;
>>>> /**
>>>> - * sysprof_active:
>>>> + * @sysprof_active:
>>>> *
>>>> - * The count of contexts that have enabled system profiling.
>>>> + * The count of contexts that have enabled system profiling plus
>>>> one.
>>>> + *
>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to
>>>> keep
>>>> + * the under/overflow checks that refcount_t provides, but allow
>>>> + * multiple on/off transitions so we track the logical value
>>>> plus one.)
>>>> */
>>>> refcount_t sysprof_active;
>>>> + /**
>>>> + * @local_counters_active:
>>>> + *
>>>> + * The count of contexts that have requested local (intra-submit)
>>>> + * performance counter usage plus one.
>>>> + *
>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to
>>>> keep
>>>> + * the under/overflow checks that refcount_t provides, but allow
>>>> + * multiple on/off transitions so we track the logical value
>>>> plus one.)
>>>> + */
>>>> + refcount_t local_counters_active;
>>>> +
>>>> /**
>>>> * lock:
>>>> *
>>>> @@ -383,6 +399,13 @@ struct msm_file_private {
>>>> */
>>>> int sysprof;
>>>> + /**
>>>> + * @counters_requested:
>>>> + *
>>>> + * Has the context requested local perfcntr usage.
>>>> + */
>>>> + bool counters_requested;
>>>> +
>>>> /**
>>>> * comm: Overridden task comm, see MSM_PARAM_COMM
>>>> *
>>>> @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref);
>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx,
>>>> struct msm_gpu *gpu, int sysprof);
>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
>>>> + struct msm_gpu *gpu, int reqcntrs);
>>>> void __msm_file_private_destroy(struct kref *kref);
>>>> static inline void msm_file_private_put(struct msm_file_private
>>>> *ctx)
>>>> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/
>>>> msm/msm_submitqueue.c
>>>> index 7fed1de63b5d..1e1e21e6f7ae 100644
>>>> --- a/drivers/gpu/drm/msm/msm_submitqueue.c
>>>> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
>>>> @@ -10,6 +10,15 @@
>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx,
>>>> struct msm_gpu *gpu, int sysprof)
>>>> {
>>>> + int ret = 0;
>>>> +
>>>> + mutex_lock(&gpu->lock);
>>>> +
>>>> + if (sysprof && (refcount_read(&gpu->local_counters_active) > 1)) {
>>>> + ret = UERR(EBUSY, gpu->dev, "Local counter usage active");
>>>> + goto out_unlock;
>>>> + }
>>>> +
>>>> /*
>>>> * Since pm_runtime and sysprof_active are both refcounts, we
>>>> * call apply the new value first, and then unwind the previous
>>>> @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct
>>>> msm_file_private *ctx,
>>>> switch (sysprof) {
>>>> default:
>>>> - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof);
>>>> + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof);
>>>> + goto out_unlock;
>>>> case 2:
>>>> pm_runtime_get_sync(&gpu->pdev->dev);
>>>> fallthrough;
>>>> @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct
>>>> msm_file_private *ctx,
>>>> ctx->sysprof = sysprof;
>>>> - return 0;
>>>> +out_unlock:
>>>> + mutex_unlock(&gpu->lock);
>>>> +
>>>> + return ret;
>>>> +}
>>>> +
>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
>>>> + struct msm_gpu *gpu, int reqctrs)
>>>> +{
>>>> + int ret = 0;
>>>> +
>>>> + mutex_lock(&gpu->lock);
>>>> +
>>>> + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) {
>>>> + ret = UERR(EBUSY, gpu->dev, "System profiling active");
>>>> + goto out_unlock;
>>>> + }
>>>> +
>>>> + if (reqctrs) {
>>>> + if (ctx->counters_requested) {
>>>> + ret = UERR(EINVAL, gpu->dev, "Already requested");
>>>> + goto out_unlock;
>>>> + }
>>>> +
>>>> + ctx->counters_requested = true;
>>>> + refcount_inc(&gpu->local_counters_active);
>>>> + } else {
>>>> + if (!ctx->counters_requested) {
>>>> + ret = UERR(EINVAL, gpu->dev, "Not requested");
>>>> + goto out_unlock;
>>>> + }
>>>> + refcount_dec(&gpu->local_counters_active);
>>>> + ctx->counters_requested = false;
>>>> + }
>>>> +
>>>> +out_unlock:
>>>> + mutex_unlock(&gpu->lock);
>>>> +
>>>> + return ret;
>>>> }
>>>> void __msm_file_private_destroy(struct kref *kref)
>>>> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
>>>> index 2342cb90857e..ae7fb355e4a1 100644
>>>> --- a/include/uapi/drm/msm_drm.h
>>>> +++ b/include/uapi/drm/msm_drm.h
>>>> @@ -91,6 +91,7 @@ struct drm_msm_timespec {
>>>> #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */
>>>> #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */
>>>> #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */
>>>> +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra-
>>>> submit) perfcntr usage */
>>>> /* For backwards compat. The original support for preemption was
>>>> based on
>>>> * a single ring per priority level so # of priority levels equals
>>>> the #
>>>
>>
>>
>> Best regards,
>
Best regards,
--
Antonino Maniscalco <antomani103@...il.com>
Powered by blists - more mailing lists