[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6460042b-a2cb-41fa-9f6f-fb11e20f69aa@quicinc.com>
Date: Tue, 16 Jul 2024 15:43:06 -0700
From: Abhinav Kumar <quic_abhinavk@...cinc.com>
To: Rob Clark <robdclark@...il.com>
CC: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
<freedreno@...ts.freedesktop.org>, Sean Paul <sean@...rly.run>,
"Marijn
Suijten" <marijn.suijten@...ainline.org>,
David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>,
<dri-devel@...ts.freedesktop.org>, <quic_jesszhan@...cinc.com>,
<swboyd@...omium.org>, <dianders@...omium.org>,
<linux-arm-msm@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 5/5] drm/msm/dpu: rate limit snapshot capture for mmu
faults
On 7/16/2024 2:50 PM, Rob Clark wrote:
> On Tue, Jul 16, 2024 at 2:45 PM Abhinav Kumar <quic_abhinavk@...cinc.com> wrote:
>>
>>
>>
>> On 7/15/2024 12:51 PM, Rob Clark wrote:
>>> On Mon, Jul 1, 2024 at 12:43 PM Dmitry Baryshkov
>>> <dmitry.baryshkov@...aro.org> wrote:
>>>>
>>>> On Fri, Jun 28, 2024 at 02:48:47PM GMT, Abhinav Kumar wrote:
>>>>> There is no recovery mechanism in place yet to recover from mmu
>>>>> faults for DPU. We can only prevent the faults by making sure there
>>>>> is no misconfiguration.
>>>>>
>>>>> Rate-limit the snapshot capture for mmu faults to once per
>>>>> msm_kms_init_aspace() as that should be sufficient to capture
>>>>> the snapshot for debugging otherwise there will be a lot of
>>>>> dpu snapshots getting captured for the same fault which is
>>>>> redundant and also might affect capturing even one snapshot
>>>>> accurately.
>>>>
>>>> Please squash this into the first patch. There is no need to add code
>>>> with a known defficiency.
>>>>
>>>> Also, is there a reason why you haven't used <linux/ratelimit.h> ?
>>>
>>> So, in some ways devcoredump is ratelimited by userspace needing to
>>> clear an existing devcore..
>>>
>>
>> Yes, a new devcoredump device will not be created until the previous one
>> is consumed or times out but here I am trying to limit even the DPU
>> snapshot capture because DPU register space is really huge and the rate
>> at which smmu faults occur is quite fast that its causing instability
>> while snapshots are being captured.
>>
>>> What I'd suggest would be more useful is to limit the devcores to once
>>> per atomic update, ie. if display state hasn't changed, maybe an
>>> additional devcore isn't useful
>>>
>>> BR,
>>> -R
>>>
>>
>> By display state change, do you mean like the checks we have in
>> drm_atomic_crtc_needs_modeset()?
>>
>> OR do you mean we need to cache the previous (currently picked up by hw)
>> state and trigger a new devcores if the new state is different by
>> comparing more things?
>>
>> This will help to reduce the snapshots to unique frame updates but I do
>> not think it will reduce the rate enough for the case where DPU did not
>> recover from the previous fault.
>
> I was thinking the easy thing, of just resetting the counter in
> msm_atomic_commit_tail().. I suppose we could be clever filter out
> updates that only change scanout address. Or hash the atomic state
> and only generate devcoredumps for unique states. But I'm not sure
> how over-complicated we should make this.
>
> BR,
> -R
Its a good idea actually and I would also like to keep it simple :)
One question, is it okay to assume that all compositors will only issue
unique commits? Because we are assuming thats the case with resetting
the counter in msm_atomic_commit_tail().
>
>>
>>>>
>>>>>
>>>>> Signed-off-by: Abhinav Kumar <quic_abhinavk@...cinc.com>
>>>>> ---
>>>>> drivers/gpu/drm/msm/msm_kms.c | 6 +++++-
>>>>> drivers/gpu/drm/msm/msm_kms.h | 3 +++
>>>>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c
>>>>> index d5d3117259cf..90a333920c01 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_kms.c
>>>>> +++ b/drivers/gpu/drm/msm/msm_kms.c
>>>>> @@ -168,7 +168,10 @@ static int msm_kms_fault_handler(void *arg, unsigned long iova, int flags, void
>>>>> {
>>>>> struct msm_kms *kms = arg;
>>>>>
>>>>> - msm_disp_snapshot_state(kms->dev);
>>>>> + if (!kms->fault_snapshot_capture) {
>>>>> + msm_disp_snapshot_state(kms->dev);
>>>>> + kms->fault_snapshot_capture++;
>>>>
>>>> When is it decremented?
>>>>
>>>>> + }
>>>>>
>>>>> return -ENOSYS;
>>>>> }
>>>>> @@ -208,6 +211,7 @@ struct msm_gem_address_space *msm_kms_init_aspace(struct drm_device *dev)
>>>>> mmu->funcs->destroy(mmu);
>>>>> }
>>>>>
>>>>> + kms->fault_snapshot_capture = 0;
>>>>> msm_mmu_set_fault_handler(aspace->mmu, kms, msm_kms_fault_handler);
>>>>>
>>>>> return aspace;
>>>>> diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
>>>>> index 1e0c54de3716..240b39e60828 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_kms.h
>>>>> +++ b/drivers/gpu/drm/msm/msm_kms.h
>>>>> @@ -134,6 +134,9 @@ struct msm_kms {
>>>>> int irq;
>>>>> bool irq_requested;
>>>>>
>>>>> + /* rate limit the snapshot capture to once per attach */
>>>>> + int fault_snapshot_capture;
>>>>> +
>>>>> /* mapper-id used to request GEM buffer mapped for scanout: */
>>>>> struct msm_gem_address_space *aspace;
>>>>>
>>>>> --
>>>>> 2.44.0
>>>>>
>>>>
>>>> --
>>>> With best wishes
>>>> Dmitry
Powered by blists - more mailing lists