linux-kernel - Re: [PATCH 5/5] drm/msm/dpu: rate limit snapshot capture for mmu faults

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAA8EJprFNXpO568hwNwJvHY_NmcHJxJECe4v3O+ONp17v1Q_iw@mail.gmail.com>
Date: Wed, 17 Jul 2024 02:42:47 +0300
From: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
To: Abhinav Kumar <quic_abhinavk@...cinc.com>
Cc: Rob Clark <robdclark@...il.com>, freedreno@...ts.freedesktop.org, 
	Sean Paul <sean@...rly.run>, Marijn Suijten <marijn.suijten@...ainline.org>, 
	David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>, dri-devel@...ts.freedesktop.org, 
	quic_jesszhan@...cinc.com, swboyd@...omium.org, dianders@...omium.org, 
	linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/5] drm/msm/dpu: rate limit snapshot capture for mmu faults

On Wed, 17 Jul 2024 at 02:15, Abhinav Kumar <quic_abhinavk@...cinc.com> wrote:
>
>
>
> On 7/16/2024 4:10 PM, Dmitry Baryshkov wrote:
> > On Wed, 17 Jul 2024 at 01:43, Abhinav Kumar <quic_abhinavk@...cinc.com> wrote:
> >>
> >>
> >>
> >> On 7/16/2024 2:50 PM, Rob Clark wrote:
> >>> On Tue, Jul 16, 2024 at 2:45 PM Abhinav Kumar <quic_abhinavk@...cinc.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/15/2024 12:51 PM, Rob Clark wrote:
> >>>>> On Mon, Jul 1, 2024 at 12:43 PM Dmitry Baryshkov
> >>>>> <dmitry.baryshkov@...aro.org> wrote:
> >>>>>>
> >>>>>> On Fri, Jun 28, 2024 at 02:48:47PM GMT, Abhinav Kumar wrote:
> >>>>>>> There is no recovery mechanism in place yet to recover from mmu
> >>>>>>> faults for DPU. We can only prevent the faults by making sure there
> >>>>>>> is no misconfiguration.
> >>>>>>>
> >>>>>>> Rate-limit the snapshot capture for mmu faults to once per
> >>>>>>> msm_kms_init_aspace() as that should be sufficient to capture
> >>>>>>> the snapshot for debugging otherwise there will be a lot of
> >>>>>>> dpu snapshots getting captured for the same fault which is
> >>>>>>> redundant and also might affect capturing even one snapshot
> >>>>>>> accurately.
> >>>>>>
> >>>>>> Please squash this into the first patch. There is no need to add code
> >>>>>> with a known defficiency.
> >>>>>>
> >>>>>> Also, is there a reason why you haven't used <linux/ratelimit.h> ?
> >>>>>
> >>>>> So, in some ways devcoredump is ratelimited by userspace needing to
> >>>>> clear an existing devcore..
> >>>>>
> >>>>
> >>>> Yes, a new devcoredump device will not be created until the previous one
> >>>> is consumed or times out but here I am trying to limit even the DPU
> >>>> snapshot capture because DPU register space is really huge and the rate
> >>>> at which smmu faults occur is quite fast that its causing instability
> >>>> while snapshots are being captured.
> >>>>
> >>>>> What I'd suggest would be more useful is to limit the devcores to once
> >>>>> per atomic update, ie. if display state hasn't changed, maybe an
> >>>>> additional devcore isn't useful
> >>>>>
> >>>>> BR,
> >>>>> -R
> >>>>>
> >>>>
> >>>> By display state change, do you mean like the checks we have in
> >>>> drm_atomic_crtc_needs_modeset()?
> >>>>
> >>>> OR do you mean we need to cache the previous (currently picked up by hw)
> >>>> state and trigger a new devcores if the new state is different by
> >>>> comparing more things?
> >>>>
> >>>> This will help to reduce the snapshots to unique frame updates but I do
> >>>> not think it will reduce the rate enough for the case where DPU did not
> >>>> recover from the previous fault.
> >>>
> >>> I was thinking the easy thing, of just resetting the counter in
> >>> msm_atomic_commit_tail().. I suppose we could be clever filter out
> >>> updates that only change scanout address.  Or hash the atomic state
> >>> and only generate devcoredumps for unique states.  But I'm not sure
> >>> how over-complicated we should make this.
> >>>
> >>> BR,
> >>> -R
> >>
> >> Its a good idea actually and I would also like to keep it simple :)
> >>
> >> One question, is it okay to assume that all compositors will only issue
> >> unique commits? Because we are assuming thats the case with resetting
> >> the counter in msm_atomic_commit_tail().
> >
> > The compositors use drm_mode_atomic_ioctl() which allocates a new
> > state each time. It is impossible to re-submit the same
> > drm_atomic_state from userspace.
> >
>
> No, what I meant was, is it okay to assume that a commit is issued only
> when display configuration has changed?

No.

> Like if we get multiple commits for the same frame for some reason,
> thats also spam and this approach will not avoid that.

I'd say, new commit means that userspace thinks that something
changed. So if the driver got another hang / issue / etc, it should
try capturing a new state.

-- 
With best wishes
Dmitry