[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8798a66b-0c07-4713-8966-5bac7bd4aae1@quicinc.com>
Date: Wed, 18 Dec 2024 00:28:02 +0530
From: Akhil P Oommen <quic_akhilpo@...cinc.com>
To: Rob Clark <robdclark@...il.com>
CC: Connor Abbott <cwabbott0@...il.com>,
Antonino Maniscalco
<antomani103@...il.com>,
<dri-devel@...ts.freedesktop.org>, <linux-arm-msm@...r.kernel.org>,
<freedreno@...ts.freedesktop.org>, Rob Clark
<robdclark@...omium.org>,
Sean Paul <sean@...rly.run>, Konrad Dybcio
<konradybcio@...nel.org>,
Abhinav Kumar <quic_abhinavk@...cinc.com>,
"Dmitry
Baryshkov" <dmitry.baryshkov@...aro.org>,
Marijn Suijten
<marijn.suijten@...ainline.org>,
David Airlie <airlied@...il.com>, "Simona
Vetter" <simona@...ll.ch>,
Maarten Lankhorst
<maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>,
open list
<linux-kernel@...r.kernel.org>
Subject: Re: [RFC] drm/msm: Add UABI to request perfcntr usage
On 12/17/2024 2:21 AM, Rob Clark wrote:
> On Mon, Dec 16, 2024 at 12:25 PM Akhil P Oommen
> <quic_akhilpo@...cinc.com> wrote:
>>
>> On 12/16/2024 10:28 PM, Connor Abbott wrote:
>>> On Mon, Dec 16, 2024 at 11:55 AM Akhil P Oommen
>>> <quic_akhilpo@...cinc.com> wrote:
>>>>
>>>> On 12/13/2024 10:40 PM, Antonino Maniscalco wrote:
>>>>> On 12/13/24 5:50 PM, Akhil P Oommen wrote:
>>>>>> On 12/12/2024 9:44 PM, Antonino Maniscalco wrote:
>>>>>>> On 12/12/24 4:58 PM, Akhil P Oommen wrote:
>>>>>>>> On 12/5/2024 10:24 PM, Rob Clark wrote:
>>>>>>>>> From: Rob Clark <robdclark@...omium.org>
>>>>>>>>>
>>>>>>>>> Performance counter usage falls into two categories:
>>>>>>>>>
>>>>>>>>> 1. Local usage, where the counter configuration, start, and end read
>>>>>>>>> happen within (locally to) a single SUBMIT. In this case,
>>>>>>>>> there is
>>>>>>>>> no dependency on counter configuration or values between submits,
>>>>>>>>> and
>>>>>>>>> in fact counters are normally cleared on context switches,
>>>>>>>>> making it
>>>>>>>>> impossible to rely on cross-submit state.
>>>>>>>>>
>>>>>>>>> 2. Global usage, where a single privilaged daemon/process is sampling
>>>>>>>>> counter values across all processes for profiling.
>>>>>>>>>
>>>>>>>>> The two categories are mutually exclusive. While you can have many
>>>>>>>>> processes making local counter usage, you cannot combine global and
>>>>>>>>> local usage without the two stepping on each others feet (by changing
>>>>>>>>> counter configuration).
>>>>>>>>>
>>>>>>>>> For global counter usage, there is already a SYSPROF param (since
>>>>>>>>> global
>>>>>>>>> counter usage requires disabling counter clearing on context switch).
>>>>>>>>> This patch adds a REQ_CNTRS param to request local counter usage. If
>>>>>>>>> one or more processes has requested counter usage, then a SYSPROF
>>>>>>>>> request will fail with -EBUSY. And if SYSPROF is active, then
>>>>>>>>> REQ_CNTRS
>>>>>>>>> will fail with -EBUSY, maintaining the mutual exclusivity.
>>>>>>>>>
>>>>>>>>> This is purely an advisory interface to help coordinate userspace.
>>>>>>>>> There is no real means of enforcement, but the worst that can
>>>>>>>>> happen if
>>>>>>>>> userspace ignores a REQ_CNTRS failure is that you'll get nonsense
>>>>>>>>> profiling data.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Rob Clark <robdclark@...omium.org>
>>>>>>>>> ---
>>>>>>>>> kgsl takes a different approach, which involves a lot more UABI for
>>>>>>>>> assigning counters to different processes. But I think by taking
>>>>>>>>> advantage of the fact that mesa (freedreno+turnip) reconfigure the
>>>>>>>>> counters they need in each SUBMIT, for their respective gl/vk perf-
>>>>>>>>> counter extensions, we can take this simpler approach.
>>>>>>>>
>>>>>>>> KGSL's approach is preemption and ifpc safe (also whatever HW changes
>>>>>>>> that will come up in future generations). How will we ensure that here?
>>>>>>>>
>>>>>>>> I have plans to bring up IFPC support in near future. Also, I
>>>>>>>> brought up
>>>>>>>> this point during preemption series. But from the responses, I felt
>>>>>>>> that
>>>>>>>> profiling was not considered a serious usecase. Still I wonder how the
>>>>>>>> perfcounter extensions work accurately with preemption.
>>>>>>>
>>>>>>> So back then I implemented the postamble IB to clear perf counters and
>>>>>>> that gets disabled when sysprof (so global usage) is happening. The
>>>>>>> kernel is oblivious to "Local isage" of profiling but in that case
>>>>>>> really what we want to do is disable preemption which in my
>>>>>>> understanding can be done from userspace with a PKT. In my understanding
>>>>>>> this had us covered for all usecases.
>>>>>>
>>>>>> I think this wasn't mentioned at that time. Which UMD PKT in a6x+ did
>>>>>> you mean?
>>>>>
>>>>> Ah, I thought it wasmentioned, sorry.
>>>>> The packet I was referring to is:
>>>>> <doc> Make next dword 1 to disable preemption, 0 to re-enable it. </
>>>>> doc>
>>>>> <value name="CP_PREEMPT_DISABLE" value="0x6c" variants="A6XX"/>
>>>>
>>>> Ah! Okay. I think this packet is not used by the downstream blob. IMO,
>>>> disabling preemption is still a suboptimal solution.
>>>
>>> Downstream doesn't expose userspace perfcounters (i.e.
>>> VK_KHR_performance_query) at all. My understanding is that Android
>>> requires you not to expose them for security reasons, but I could be
>>> wrong there. In any case, Qualcomm clearly hasn't really thought
>>> through what it would take to make everything work well with userspace
>>> perfcounters and hasn't implemented the necessary firmware bits for
>>> that, so we're left muddling through and doing what we can.
>>>
>>
>> Honestly, I don't know what you meant by the missing support.
>
> GMU support to save/restore SEL regs on IFPC when SYSPROF is enabled ;-)
>
That won't solve the case of reading counters via devmem. That still
would require disabling ifpc.
> If we had that, we wouldn't need ioclts to assign+configure counters,
> everything else could be done in userspace (modulo trying to do both
> local and global profiling at the same time.. but I'm not convinced
> that is a valid use-case, as I mentioned earlier)
>
Lets ignore this use-case then. We can revisit if it becomes a thing in
upstream.
-Akhil.
> BR,
> -R
>
>> -Akhil.
>>
>>> Connor
>>>
>>>>
>>>>>
>>>>> BTW you mentioned wanting to look into IFPC. Since I too wanted to look
>>>>> into implementing it wonder if you could let me know when you planned on
>>>>> working on it.
>>>>
>>>> I have few patches in progress. Nothing final yet and need verification
>>>> on the hw side. Also, I need to do some housekeeping here to debug gmu
>>>> issues since IFPC increases the probability of those a lot.
>>>>
>>>> I will try to send out the patches very soon.
>>>>
>>>> -Akhil.
>>>>
>>>>>
>>>>>>
>>>>>> -Akhil.
>>>>>>
>>>>>>>
>>>>>>> So what would you expect instead we should do kernel side to make
>>>>>>> profiling preemption safe?
>>>>>>>
>>>>>>>>
>>>>>>>> -Akhil
>>>>>>>>
>>>>>>>>>
>>>>>>>>> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +
>>>>>>>>> drivers/gpu/drm/msm/msm_drv.c | 5 ++-
>>>>>>>>> drivers/gpu/drm/msm/msm_gpu.c | 1 +
>>>>>>>>> drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++-
>>>>>>>>> drivers/gpu/drm/msm/msm_submitqueue.c | 52 +++++++++++++++++++
>>>>>>>>> +++++-
>>>>>>>>> include/uapi/drm/msm_drm.h | 1 +
>>>>>>>>> 6 files changed, 85 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/
>>>>>>>>> drm/msm/adreno/adreno_gpu.c
>>>>>>>>> index 31bbf2c83de4..f688e37059b8 100644
>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>>>>> @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct
>>>>>>>>> msm_file_private *ctx,
>>>>>>>>> if (!capable(CAP_SYS_ADMIN))
>>>>>>>>> return UERR(EPERM, drm, "invalid permissions");
>>>>>>>>> return msm_file_private_set_sysprof(ctx, gpu, value);
>>>>>>>>> + case MSM_PARAM_REQ_CNTRS:
>>>>>>>>> + return msm_file_private_request_counters(ctx, gpu, value);
>>>>>>>>> default:
>>>>>>>>> return UERR(EINVAL, drm, "%s: invalid param: %u", gpu-
>>>>>>>>>> name, param);
>>>>>>>>> }
>>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/
>>>>>>>>> msm_drv.c
>>>>>>>>> index 6416d2cb4efc..bf8314ff4a25 100644
>>>>>>>>> --- a/drivers/gpu/drm/msm/msm_drv.c
>>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>>>>>>>>> @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device
>>>>>>>>> *dev, struct drm_file *file)
>>>>>>>>> * It is not possible to set sysprof param to non-zero if gpu
>>>>>>>>> * is not initialized:
>>>>>>>>> */
>>>>>>>>> - if (priv->gpu)
>>>>>>>>> + if (ctx->sysprof)
>>>>>>>>> msm_file_private_set_sysprof(ctx, priv->gpu, 0);
>>>>>>>>> + if (ctx->counters_requested)
>>>>>>>>> + msm_file_private_request_counters(ctx, priv->gpu, 0);
>>>>>>>>> +
>>>>>>>>> context_close(ctx);
>>>>>>>>> }
>>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/
>>>>>>>>> msm_gpu.c
>>>>>>>>> index 82f204f3bb8f..013b59ca3bb1 100644
>>>>>>>>> --- a/drivers/gpu/drm/msm/msm_gpu.c
>>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
>>>>>>>>> @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct
>>>>>>>>> platform_device *pdev,
>>>>>>>>> gpu->nr_rings = nr_rings;
>>>>>>>>> refcount_set(&gpu->sysprof_active, 1);
>>>>>>>>> + refcount_set(&gpu->local_counters_active, 1);
>>>>>>>>> return 0;
>>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/
>>>>>>>>> msm_gpu.h
>>>>>>>>> index e25009150579..83c61e523b1b 100644
>>>>>>>>> --- a/drivers/gpu/drm/msm/msm_gpu.h
>>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h
>>>>>>>>> @@ -195,12 +195,28 @@ struct msm_gpu {
>>>>>>>>> int nr_rings;
>>>>>>>>> /**
>>>>>>>>> - * sysprof_active:
>>>>>>>>> + * @sysprof_active:
>>>>>>>>> *
>>>>>>>>> - * The count of contexts that have enabled system profiling.
>>>>>>>>> + * The count of contexts that have enabled system profiling plus
>>>>>>>>> one.
>>>>>>>>> + *
>>>>>>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to
>>>>>>>>> keep
>>>>>>>>> + * the under/overflow checks that refcount_t provides, but allow
>>>>>>>>> + * multiple on/off transitions so we track the logical value
>>>>>>>>> plus one.)
>>>>>>>>> */
>>>>>>>>> refcount_t sysprof_active;
>>>>>>>>> + /**
>>>>>>>>> + * @local_counters_active:
>>>>>>>>> + *
>>>>>>>>> + * The count of contexts that have requested local (intra-submit)
>>>>>>>>> + * performance counter usage plus one.
>>>>>>>>> + *
>>>>>>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to
>>>>>>>>> keep
>>>>>>>>> + * the under/overflow checks that refcount_t provides, but allow
>>>>>>>>> + * multiple on/off transitions so we track the logical value
>>>>>>>>> plus one.)
>>>>>>>>> + */
>>>>>>>>> + refcount_t local_counters_active;
>>>>>>>>> +
>>>>>>>>> /**
>>>>>>>>> * lock:
>>>>>>>>> *
>>>>>>>>> @@ -383,6 +399,13 @@ struct msm_file_private {
>>>>>>>>> */
>>>>>>>>> int sysprof;
>>>>>>>>> + /**
>>>>>>>>> + * @counters_requested:
>>>>>>>>> + *
>>>>>>>>> + * Has the context requested local perfcntr usage.
>>>>>>>>> + */
>>>>>>>>> + bool counters_requested;
>>>>>>>>> +
>>>>>>>>> /**
>>>>>>>>> * comm: Overridden task comm, see MSM_PARAM_COMM
>>>>>>>>> *
>>>>>>>>> @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref);
>>>>>>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx,
>>>>>>>>> struct msm_gpu *gpu, int sysprof);
>>>>>>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
>>>>>>>>> + struct msm_gpu *gpu, int reqcntrs);
>>>>>>>>> void __msm_file_private_destroy(struct kref *kref);
>>>>>>>>> static inline void msm_file_private_put(struct msm_file_private
>>>>>>>>> *ctx)
>>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/
>>>>>>>>> msm/msm_submitqueue.c
>>>>>>>>> index 7fed1de63b5d..1e1e21e6f7ae 100644
>>>>>>>>> --- a/drivers/gpu/drm/msm/msm_submitqueue.c
>>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
>>>>>>>>> @@ -10,6 +10,15 @@
>>>>>>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx,
>>>>>>>>> struct msm_gpu *gpu, int sysprof)
>>>>>>>>> {
>>>>>>>>> + int ret = 0;
>>>>>>>>> +
>>>>>>>>> + mutex_lock(&gpu->lock);
>>>>>>>>> +
>>>>>>>>> + if (sysprof && (refcount_read(&gpu->local_counters_active) >
>>>>>>>>> 1)) {
>>>>>>>>> + ret = UERR(EBUSY, gpu->dev, "Local counter usage active");
>>>>>>>>> + goto out_unlock;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> /*
>>>>>>>>> * Since pm_runtime and sysprof_active are both refcounts, we
>>>>>>>>> * call apply the new value first, and then unwind the previous
>>>>>>>>> @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct
>>>>>>>>> msm_file_private *ctx,
>>>>>>>>> switch (sysprof) {
>>>>>>>>> default:
>>>>>>>>> - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d",
>>>>>>>>> sysprof);
>>>>>>>>> + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof);
>>>>>>>>> + goto out_unlock;
>>>>>>>>> case 2:
>>>>>>>>> pm_runtime_get_sync(&gpu->pdev->dev);
>>>>>>>>> fallthrough;
>>>>>>>>> @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct
>>>>>>>>> msm_file_private *ctx,
>>>>>>>>> ctx->sysprof = sysprof;
>>>>>>>>> - return 0;
>>>>>>>>> +out_unlock:
>>>>>>>>> + mutex_unlock(&gpu->lock);
>>>>>>>>> +
>>>>>>>>> + return ret;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
>>>>>>>>> + struct msm_gpu *gpu, int reqctrs)
>>>>>>>>> +{
>>>>>>>>> + int ret = 0;
>>>>>>>>> +
>>>>>>>>> + mutex_lock(&gpu->lock);
>>>>>>>>> +
>>>>>>>>> + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) {
>>>>>>>>> + ret = UERR(EBUSY, gpu->dev, "System profiling active");
>>>>>>>>> + goto out_unlock;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> + if (reqctrs) {
>>>>>>>>> + if (ctx->counters_requested) {
>>>>>>>>> + ret = UERR(EINVAL, gpu->dev, "Already requested");
>>>>>>>>> + goto out_unlock;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> + ctx->counters_requested = true;
>>>>>>>>> + refcount_inc(&gpu->local_counters_active);
>>>>>>>>> + } else {
>>>>>>>>> + if (!ctx->counters_requested) {
>>>>>>>>> + ret = UERR(EINVAL, gpu->dev, "Not requested");
>>>>>>>>> + goto out_unlock;
>>>>>>>>> + }
>>>>>>>>> + refcount_dec(&gpu->local_counters_active);
>>>>>>>>> + ctx->counters_requested = false;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> +out_unlock:
>>>>>>>>> + mutex_unlock(&gpu->lock);
>>>>>>>>> +
>>>>>>>>> + return ret;
>>>>>>>>> }
>>>>>>>>> void __msm_file_private_destroy(struct kref *kref)
>>>>>>>>> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
>>>>>>>>> index 2342cb90857e..ae7fb355e4a1 100644
>>>>>>>>> --- a/include/uapi/drm/msm_drm.h
>>>>>>>>> +++ b/include/uapi/drm/msm_drm.h
>>>>>>>>> @@ -91,6 +91,7 @@ struct drm_msm_timespec {
>>>>>>>>> #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */
>>>>>>>>> #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */
>>>>>>>>> #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */
>>>>>>>>> +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra-
>>>>>>>>> submit) perfcntr usage */
>>>>>>>>> /* For backwards compat. The original support for preemption was
>>>>>>>>> based on
>>>>>>>>> * a single ring per priority level so # of priority levels equals
>>>>>>>>> the #
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>
>>>>>
>>>>>
>>>>> Best regards,
>>>>
>>
Powered by blists - more mailing lists