lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf8d00cd-6dc6-42b9-be61-93ef48d42b0c@quicinc.com>
Date: Tue, 16 Jul 2024 14:45:24 -0700
From: Abhinav Kumar <quic_abhinavk@...cinc.com>
To: Rob Clark <robdclark@...il.com>,
        Dmitry Baryshkov
	<dmitry.baryshkov@...aro.org>
CC: <freedreno@...ts.freedesktop.org>, Sean Paul <sean@...rly.run>,
        "Marijn
 Suijten" <marijn.suijten@...ainline.org>,
        David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>,
        <dri-devel@...ts.freedesktop.org>, <quic_jesszhan@...cinc.com>,
        <swboyd@...omium.org>, <dianders@...omium.org>,
        <linux-arm-msm@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 5/5] drm/msm/dpu: rate limit snapshot capture for mmu
 faults



On 7/15/2024 12:51 PM, Rob Clark wrote:
> On Mon, Jul 1, 2024 at 12:43 PM Dmitry Baryshkov
> <dmitry.baryshkov@...aro.org> wrote:
>>
>> On Fri, Jun 28, 2024 at 02:48:47PM GMT, Abhinav Kumar wrote:
>>> There is no recovery mechanism in place yet to recover from mmu
>>> faults for DPU. We can only prevent the faults by making sure there
>>> is no misconfiguration.
>>>
>>> Rate-limit the snapshot capture for mmu faults to once per
>>> msm_kms_init_aspace() as that should be sufficient to capture
>>> the snapshot for debugging otherwise there will be a lot of
>>> dpu snapshots getting captured for the same fault which is
>>> redundant and also might affect capturing even one snapshot
>>> accurately.
>>
>> Please squash this into the first patch. There is no need to add code
>> with a known defficiency.
>>
>> Also, is there a reason why you haven't used <linux/ratelimit.h> ?
> 
> So, in some ways devcoredump is ratelimited by userspace needing to
> clear an existing devcore..
> 

Yes, a new devcoredump device will not be created until the previous one 
is consumed or times out but here I am trying to limit even the DPU 
snapshot capture because DPU register space is really huge and the rate 
at which smmu faults occur is quite fast that its causing instability 
while snapshots are being captured.

> What I'd suggest would be more useful is to limit the devcores to once
> per atomic update, ie. if display state hasn't changed, maybe an
> additional devcore isn't useful
> 
> BR,
> -R
> 

By display state change, do you mean like the checks we have in 
drm_atomic_crtc_needs_modeset()?

OR do you mean we need to cache the previous (currently picked up by hw) 
state and trigger a new devcores if the new state is different by 
comparing more things?

This will help to reduce the snapshots to unique frame updates but I do 
not think it will reduce the rate enough for the case where DPU did not 
recover from the previous fault.

>>
>>>
>>> Signed-off-by: Abhinav Kumar <quic_abhinavk@...cinc.com>
>>> ---
>>>   drivers/gpu/drm/msm/msm_kms.c | 6 +++++-
>>>   drivers/gpu/drm/msm/msm_kms.h | 3 +++
>>>   2 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c
>>> index d5d3117259cf..90a333920c01 100644
>>> --- a/drivers/gpu/drm/msm/msm_kms.c
>>> +++ b/drivers/gpu/drm/msm/msm_kms.c
>>> @@ -168,7 +168,10 @@ static int msm_kms_fault_handler(void *arg, unsigned long iova, int flags, void
>>>   {
>>>        struct msm_kms *kms = arg;
>>>
>>> -     msm_disp_snapshot_state(kms->dev);
>>> +     if (!kms->fault_snapshot_capture) {
>>> +             msm_disp_snapshot_state(kms->dev);
>>> +             kms->fault_snapshot_capture++;
>>
>> When is it decremented?
>>
>>> +     }
>>>
>>>        return -ENOSYS;
>>>   }
>>> @@ -208,6 +211,7 @@ struct msm_gem_address_space *msm_kms_init_aspace(struct drm_device *dev)
>>>                mmu->funcs->destroy(mmu);
>>>        }
>>>
>>> +     kms->fault_snapshot_capture = 0;
>>>        msm_mmu_set_fault_handler(aspace->mmu, kms, msm_kms_fault_handler);
>>>
>>>        return aspace;
>>> diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
>>> index 1e0c54de3716..240b39e60828 100644
>>> --- a/drivers/gpu/drm/msm/msm_kms.h
>>> +++ b/drivers/gpu/drm/msm/msm_kms.h
>>> @@ -134,6 +134,9 @@ struct msm_kms {
>>>        int irq;
>>>        bool irq_requested;
>>>
>>> +     /* rate limit the snapshot capture to once per attach */
>>> +     int fault_snapshot_capture;
>>> +
>>>        /* mapper-id used to request GEM buffer mapped for scanout: */
>>>        struct msm_gem_address_space *aspace;
>>>
>>> --
>>> 2.44.0
>>>
>>
>> --
>> With best wishes
>> Dmitry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ