linux-kernel - Re: [PATCH] drm/msm/a6xx+: Insert a fence wait before SMMU table update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF6AEGsVaq33wJzfnuvLWSPbmecx-j8a8FoCenKkBLMuqBTwdg@mail.gmail.com>
Date: Tue, 17 Sep 2024 18:30:27 -0700
From: Rob Clark <robdclark@...il.com>
To: Konrad Dybcio <konradybcio@...nel.org>
Cc: dri-devel@...ts.freedesktop.org, linux-arm-msm@...r.kernel.org, 
	freedreno@...ts.freedesktop.org, Akhil P Oommen <quic_akhilpo@...cinc.com>, 
	Connor Abbott <cwabbott0@...il.com>, Rob Clark <robdclark@...omium.org>, 
	Sean Paul <sean@...rly.run>, Abhinav Kumar <quic_abhinavk@...cinc.com>, 
	Dmitry Baryshkov <dmitry.baryshkov@...aro.org>, 
	Marijn Suijten <marijn.suijten@...ainline.org>, David Airlie <airlied@...il.com>, 
	Daniel Vetter <daniel@...ll.ch>, open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] drm/msm/a6xx+: Insert a fence wait before SMMU table update

On Tue, Sep 17, 2024 at 4:37 PM Konrad Dybcio <konradybcio@...nel.org> wrote:
>
> On 17.09.2024 5:30 PM, Rob Clark wrote:
> > On Tue, Sep 17, 2024 at 6:47 AM Konrad Dybcio <konradybcio@...nel.org> wrote:
> >>
> >> On 13.09.2024 9:51 PM, Rob Clark wrote:
> >>> From: Rob Clark <robdclark@...omium.org>
> >>>
> >>> The CP_SMMU_TABLE_UPDATE _should_ be waiting for idle, but on some
> >>> devices (x1-85, possibly others), it seems to pass that barrier while
> >>> there are still things in the event completion FIFO waiting to be
> >>> written back to memory.
> >>
> >> Can we try to force-fault around here on other GPUs and perhaps
> >> limit this workaround?
> >
> > not sure what you mean by "force-fault"...
>
> I suppose 'reproduce' is what I meant

I haven't _noticed_ it yet.. if you want to try on devices you have,
glmark2 seems to be good at reproducing..

I think the reason is combo of high fps (on x1-85 most scenes are
north of 8k fps) so you get a lot of context switches btwn compositor
and glmark2.  Most scenes are just a clear plus single draw, and I
guess the compositor is just doing a single draw/blit.  A6xx can be
two draws/blits deep in it's pipeline, a7xx can be four, which maybe
exacerbates this.

> > we could probably limit
> > this to certain GPUs, the only reason I didn't is (a) it should be
> > harmless when it is not needed,
>
> Do we have any realistic perf hits here?

I don't think so, we can't switch ttbr0 while the gpu is still busy so
what the sqe does for CP_SMMU_TABLE_UPDATE _should_ be equivalent.
Maybe it amounts to some extra CP cycles and memory read, but I think
that should be negligible given that the expensive thing is that we
are stalling the gpu until it is idle.

> > and (b) I have no real good way to get
> > an exhaustive list of where it is needed.  Maybe/hopefully it is only
> > x1-85, but idk.
> >
> > It does bring up an interesting question about preemption, though
>
> Yeah..

The KMD does setup an xAMBLE to clear the perfcntrs on context switch.
We could maybe piggy back on that, but I guess we'd have to patch in
the fence value to wait for?

> Do we know what windows does here?

not sure, maybe akhil has some way to check.  Whether a similar
scenario comes up with windows probably depends on how the winsys
works.  If it dropped frames when rendering >vblank rate, you'd get
fewer context switches.

BR,
-R

> Konrad