linux-kernel - Re: [PATCH 5/5] drm/msm/dpu: rate limit snapshot capture for mmu faults

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAA8EJprmi9zxEipq=0LyBi4nJ59BrLgN1sL+3u7-n9kJ3yQcRg@mail.gmail.com>
Date: Wed, 17 Jul 2024 02:10:44 +0300
From: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
To: Abhinav Kumar <quic_abhinavk@...cinc.com>
Cc: Rob Clark <robdclark@...il.com>, freedreno@...ts.freedesktop.org, 
	Sean Paul <sean@...rly.run>, Marijn Suijten <marijn.suijten@...ainline.org>, 
	David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>, dri-devel@...ts.freedesktop.org, 
	quic_jesszhan@...cinc.com, swboyd@...omium.org, dianders@...omium.org, 
	linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/5] drm/msm/dpu: rate limit snapshot capture for mmu faults

On Wed, 17 Jul 2024 at 01:43, Abhinav Kumar <quic_abhinavk@...cinc.com> wrote:
>
>
>
> On 7/16/2024 2:50 PM, Rob Clark wrote:
> > On Tue, Jul 16, 2024 at 2:45 PM Abhinav Kumar <quic_abhinavk@...cinc.com> wrote:
> >>
> >>
> >>
> >> On 7/15/2024 12:51 PM, Rob Clark wrote:
> >>> On Mon, Jul 1, 2024 at 12:43 PM Dmitry Baryshkov
> >>> <dmitry.baryshkov@...aro.org> wrote:
> >>>>
> >>>> On Fri, Jun 28, 2024 at 02:48:47PM GMT, Abhinav Kumar wrote:
> >>>>> There is no recovery mechanism in place yet to recover from mmu
> >>>>> faults for DPU. We can only prevent the faults by making sure there
> >>>>> is no misconfiguration.
> >>>>>
> >>>>> Rate-limit the snapshot capture for mmu faults to once per
> >>>>> msm_kms_init_aspace() as that should be sufficient to capture
> >>>>> the snapshot for debugging otherwise there will be a lot of
> >>>>> dpu snapshots getting captured for the same fault which is
> >>>>> redundant and also might affect capturing even one snapshot
> >>>>> accurately.
> >>>>
> >>>> Please squash this into the first patch. There is no need to add code
> >>>> with a known defficiency.
> >>>>
> >>>> Also, is there a reason why you haven't used <linux/ratelimit.h> ?
> >>>
> >>> So, in some ways devcoredump is ratelimited by userspace needing to
> >>> clear an existing devcore..
> >>>
> >>
> >> Yes, a new devcoredump device will not be created until the previous one
> >> is consumed or times out but here I am trying to limit even the DPU
> >> snapshot capture because DPU register space is really huge and the rate
> >> at which smmu faults occur is quite fast that its causing instability
> >> while snapshots are being captured.
> >>
> >>> What I'd suggest would be more useful is to limit the devcores to once
> >>> per atomic update, ie. if display state hasn't changed, maybe an
> >>> additional devcore isn't useful
> >>>
> >>> BR,
> >>> -R
> >>>
> >>
> >> By display state change, do you mean like the checks we have in
> >> drm_atomic_crtc_needs_modeset()?
> >>
> >> OR do you mean we need to cache the previous (currently picked up by hw)
> >> state and trigger a new devcores if the new state is different by
> >> comparing more things?
> >>
> >> This will help to reduce the snapshots to unique frame updates but I do
> >> not think it will reduce the rate enough for the case where DPU did not
> >> recover from the previous fault.
> >
> > I was thinking the easy thing, of just resetting the counter in
> > msm_atomic_commit_tail().. I suppose we could be clever filter out
> > updates that only change scanout address.  Or hash the atomic state
> > and only generate devcoredumps for unique states.  But I'm not sure
> > how over-complicated we should make this.
> >
> > BR,
> > -R
>
> Its a good idea actually and I would also like to keep it simple :)
>
> One question, is it okay to assume that all compositors will only issue
> unique commits? Because we are assuming thats the case with resetting
> the counter in msm_atomic_commit_tail().

The compositors use drm_mode_atomic_ioctl() which allocates a new
state each time. It is impossible to re-submit the same
drm_atomic_state from userspace.


-- 
With best wishes
Dmitry