[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <96e918d7-6e29-4d0a-8e9d-b77232c37ef0@quicinc.com>
Date: Fri, 13 Dec 2024 22:20:03 +0530
From: Akhil P Oommen <quic_akhilpo@...cinc.com>
To: Antonino Maniscalco <antomani103@...il.com>,
Rob Clark
<robdclark@...il.com>, <dri-devel@...ts.freedesktop.org>
CC: <linux-arm-msm@...r.kernel.org>, <freedreno@...ts.freedesktop.org>,
"Rob
Clark" <robdclark@...omium.org>, Sean Paul <sean@...rly.run>,
Konrad Dybcio
<konradybcio@...nel.org>,
Abhinav Kumar <quic_abhinavk@...cinc.com>,
"Dmitry
Baryshkov" <dmitry.baryshkov@...aro.org>,
Marijn Suijten
<marijn.suijten@...ainline.org>,
David Airlie <airlied@...il.com>, "Simona
Vetter" <simona@...ll.ch>,
Maarten Lankhorst
<maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>,
open list
<linux-kernel@...r.kernel.org>
Subject: Re: [RFC] drm/msm: Add UABI to request perfcntr usage
On 12/12/2024 9:44 PM, Antonino Maniscalco wrote:
> On 12/12/24 4:58 PM, Akhil P Oommen wrote:
>> On 12/5/2024 10:24 PM, Rob Clark wrote:
>>> From: Rob Clark <robdclark@...omium.org>
>>>
>>> Performance counter usage falls into two categories:
>>>
>>> 1. Local usage, where the counter configuration, start, and end read
>>> happen within (locally to) a single SUBMIT. In this case, there is
>>> no dependency on counter configuration or values between submits,
>>> and
>>> in fact counters are normally cleared on context switches, making it
>>> impossible to rely on cross-submit state.
>>>
>>> 2. Global usage, where a single privilaged daemon/process is sampling
>>> counter values across all processes for profiling.
>>>
>>> The two categories are mutually exclusive. While you can have many
>>> processes making local counter usage, you cannot combine global and
>>> local usage without the two stepping on each others feet (by changing
>>> counter configuration).
>>>
>>> For global counter usage, there is already a SYSPROF param (since global
>>> counter usage requires disabling counter clearing on context switch).
>>> This patch adds a REQ_CNTRS param to request local counter usage. If
>>> one or more processes has requested counter usage, then a SYSPROF
>>> request will fail with -EBUSY. And if SYSPROF is active, then REQ_CNTRS
>>> will fail with -EBUSY, maintaining the mutual exclusivity.
>>>
>>> This is purely an advisory interface to help coordinate userspace.
>>> There is no real means of enforcement, but the worst that can happen if
>>> userspace ignores a REQ_CNTRS failure is that you'll get nonsense
>>> profiling data.
>>>
>>> Signed-off-by: Rob Clark <robdclark@...omium.org>
>>> ---
>>> kgsl takes a different approach, which involves a lot more UABI for
>>> assigning counters to different processes. But I think by taking
>>> advantage of the fact that mesa (freedreno+turnip) reconfigure the
>>> counters they need in each SUBMIT, for their respective gl/vk perf-
>>> counter extensions, we can take this simpler approach.
>>
>> KGSL's approach is preemption and ifpc safe (also whatever HW changes
>> that will come up in future generations). How will we ensure that here?
>>
>> I have plans to bring up IFPC support in near future. Also, I brought up
>> this point during preemption series. But from the responses, I felt that
>> profiling was not considered a serious usecase. Still I wonder how the
>> perfcounter extensions work accurately with preemption.
>
> So back then I implemented the postamble IB to clear perf counters and
> that gets disabled when sysprof (so global usage) is happening. The
> kernel is oblivious to "Local isage" of profiling but in that case
> really what we want to do is disable preemption which in my
> understanding can be done from userspace with a PKT. In my understanding
> this had us covered for all usecases.
I think this wasn't mentioned at that time. Which UMD PKT in a6x+ did
you mean?
-Akhil.
>
> So what would you expect instead we should do kernel side to make
> profiling preemption safe?
>
>>
>> -Akhil
>>
>>>
>>> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +
>>> drivers/gpu/drm/msm/msm_drv.c | 5 ++-
>>> drivers/gpu/drm/msm/msm_gpu.c | 1 +
>>> drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++-
>>> drivers/gpu/drm/msm/msm_submitqueue.c | 52 ++++++++++++++++++++++++-
>>> include/uapi/drm/msm_drm.h | 1 +
>>> 6 files changed, 85 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/
>>> drm/msm/adreno/adreno_gpu.c
>>> index 31bbf2c83de4..f688e37059b8 100644
>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>> @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct
>>> msm_file_private *ctx,
>>> if (!capable(CAP_SYS_ADMIN))
>>> return UERR(EPERM, drm, "invalid permissions");
>>> return msm_file_private_set_sysprof(ctx, gpu, value);
>>> + case MSM_PARAM_REQ_CNTRS:
>>> + return msm_file_private_request_counters(ctx, gpu, value);
>>> default:
>>> return UERR(EINVAL, drm, "%s: invalid param: %u", gpu-
>>> >name, param);
>>> }
>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/
>>> msm_drv.c
>>> index 6416d2cb4efc..bf8314ff4a25 100644
>>> --- a/drivers/gpu/drm/msm/msm_drv.c
>>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>>> @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device
>>> *dev, struct drm_file *file)
>>> * It is not possible to set sysprof param to non-zero if gpu
>>> * is not initialized:
>>> */
>>> - if (priv->gpu)
>>> + if (ctx->sysprof)
>>> msm_file_private_set_sysprof(ctx, priv->gpu, 0);
>>> + if (ctx->counters_requested)
>>> + msm_file_private_request_counters(ctx, priv->gpu, 0);
>>> +
>>> context_close(ctx);
>>> }
>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/
>>> msm_gpu.c
>>> index 82f204f3bb8f..013b59ca3bb1 100644
>>> --- a/drivers/gpu/drm/msm/msm_gpu.c
>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
>>> @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct
>>> platform_device *pdev,
>>> gpu->nr_rings = nr_rings;
>>> refcount_set(&gpu->sysprof_active, 1);
>>> + refcount_set(&gpu->local_counters_active, 1);
>>> return 0;
>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/
>>> msm_gpu.h
>>> index e25009150579..83c61e523b1b 100644
>>> --- a/drivers/gpu/drm/msm/msm_gpu.h
>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h
>>> @@ -195,12 +195,28 @@ struct msm_gpu {
>>> int nr_rings;
>>> /**
>>> - * sysprof_active:
>>> + * @sysprof_active:
>>> *
>>> - * The count of contexts that have enabled system profiling.
>>> + * The count of contexts that have enabled system profiling plus
>>> one.
>>> + *
>>> + * Note: refcount_t does not like 0->1 transitions.. we want to
>>> keep
>>> + * the under/overflow checks that refcount_t provides, but allow
>>> + * multiple on/off transitions so we track the logical value
>>> plus one.)
>>> */
>>> refcount_t sysprof_active;
>>> + /**
>>> + * @local_counters_active:
>>> + *
>>> + * The count of contexts that have requested local (intra-submit)
>>> + * performance counter usage plus one.
>>> + *
>>> + * Note: refcount_t does not like 0->1 transitions.. we want to
>>> keep
>>> + * the under/overflow checks that refcount_t provides, but allow
>>> + * multiple on/off transitions so we track the logical value
>>> plus one.)
>>> + */
>>> + refcount_t local_counters_active;
>>> +
>>> /**
>>> * lock:
>>> *
>>> @@ -383,6 +399,13 @@ struct msm_file_private {
>>> */
>>> int sysprof;
>>> + /**
>>> + * @counters_requested:
>>> + *
>>> + * Has the context requested local perfcntr usage.
>>> + */
>>> + bool counters_requested;
>>> +
>>> /**
>>> * comm: Overridden task comm, see MSM_PARAM_COMM
>>> *
>>> @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref);
>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx,
>>> struct msm_gpu *gpu, int sysprof);
>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
>>> + struct msm_gpu *gpu, int reqcntrs);
>>> void __msm_file_private_destroy(struct kref *kref);
>>> static inline void msm_file_private_put(struct msm_file_private
>>> *ctx)
>>> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/
>>> msm/msm_submitqueue.c
>>> index 7fed1de63b5d..1e1e21e6f7ae 100644
>>> --- a/drivers/gpu/drm/msm/msm_submitqueue.c
>>> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
>>> @@ -10,6 +10,15 @@
>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx,
>>> struct msm_gpu *gpu, int sysprof)
>>> {
>>> + int ret = 0;
>>> +
>>> + mutex_lock(&gpu->lock);
>>> +
>>> + if (sysprof && (refcount_read(&gpu->local_counters_active) > 1)) {
>>> + ret = UERR(EBUSY, gpu->dev, "Local counter usage active");
>>> + goto out_unlock;
>>> + }
>>> +
>>> /*
>>> * Since pm_runtime and sysprof_active are both refcounts, we
>>> * call apply the new value first, and then unwind the previous
>>> @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct
>>> msm_file_private *ctx,
>>> switch (sysprof) {
>>> default:
>>> - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof);
>>> + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof);
>>> + goto out_unlock;
>>> case 2:
>>> pm_runtime_get_sync(&gpu->pdev->dev);
>>> fallthrough;
>>> @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct
>>> msm_file_private *ctx,
>>> ctx->sysprof = sysprof;
>>> - return 0;
>>> +out_unlock:
>>> + mutex_unlock(&gpu->lock);
>>> +
>>> + return ret;
>>> +}
>>> +
>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
>>> + struct msm_gpu *gpu, int reqctrs)
>>> +{
>>> + int ret = 0;
>>> +
>>> + mutex_lock(&gpu->lock);
>>> +
>>> + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) {
>>> + ret = UERR(EBUSY, gpu->dev, "System profiling active");
>>> + goto out_unlock;
>>> + }
>>> +
>>> + if (reqctrs) {
>>> + if (ctx->counters_requested) {
>>> + ret = UERR(EINVAL, gpu->dev, "Already requested");
>>> + goto out_unlock;
>>> + }
>>> +
>>> + ctx->counters_requested = true;
>>> + refcount_inc(&gpu->local_counters_active);
>>> + } else {
>>> + if (!ctx->counters_requested) {
>>> + ret = UERR(EINVAL, gpu->dev, "Not requested");
>>> + goto out_unlock;
>>> + }
>>> + refcount_dec(&gpu->local_counters_active);
>>> + ctx->counters_requested = false;
>>> + }
>>> +
>>> +out_unlock:
>>> + mutex_unlock(&gpu->lock);
>>> +
>>> + return ret;
>>> }
>>> void __msm_file_private_destroy(struct kref *kref)
>>> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
>>> index 2342cb90857e..ae7fb355e4a1 100644
>>> --- a/include/uapi/drm/msm_drm.h
>>> +++ b/include/uapi/drm/msm_drm.h
>>> @@ -91,6 +91,7 @@ struct drm_msm_timespec {
>>> #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */
>>> #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */
>>> #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */
>>> +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra-
>>> submit) perfcntr usage */
>>> /* For backwards compat. The original support for preemption was
>>> based on
>>> * a single ring per priority level so # of priority levels equals
>>> the #
>>
>
>
> Best regards,
Powered by blists - more mailing lists