[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF6AEGtYh6jnYcFLcUnEobjQqKmqxuX29wO1qqnGYFQJ+EUBxw@mail.gmail.com>
Date: Wed, 28 Aug 2024 06:42:59 -0700
From: Rob Clark <robdclark@...il.com>
To: Antonino Maniscalco <antomani103@...il.com>
Cc: Akhil P Oommen <quic_akhilpo@...cinc.com>, Connor Abbott <cwabbott0@...il.com>,
Sean Paul <sean@...rly.run>, Konrad Dybcio <konrad.dybcio@...aro.org>,
Abhinav Kumar <quic_abhinavk@...cinc.com>, Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
Marijn Suijten <marijn.suijten@...ainline.org>, David Airlie <airlied@...il.com>,
Daniel Vetter <daniel@...ll.ch>, Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, linux-arm-msm@...r.kernel.org,
dri-devel@...ts.freedesktop.org, freedreno@...ts.freedesktop.org,
linux-kernel@...r.kernel.org, Sharat Masetty <smasetty@...eaurora.org>
Subject: Re: [PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets
On Tue, Aug 27, 2024 at 3:56 PM Antonino Maniscalco
<antomani103@...il.com> wrote:
>
> On 8/27/24 11:07 PM, Rob Clark wrote:
> > On Tue, Aug 27, 2024 at 1:25 PM Antonino Maniscalco
> > <antomani103@...il.com> wrote:
> >>
> >> On 8/27/24 9:48 PM, Akhil P Oommen wrote:
> >>> On Fri, Aug 23, 2024 at 10:23:48AM +0100, Connor Abbott wrote:
> >>>> On Fri, Aug 23, 2024 at 10:21 AM Connor Abbott <cwabbott0@...il.com> wrote:
> >>>>>
> >>>>> On Thu, Aug 22, 2024 at 9:06 PM Akhil P Oommen <quic_akhilpo@...cinc.com> wrote:
> >>>>>>
> >>>>>> On Wed, Aug 21, 2024 at 05:02:56PM +0100, Connor Abbott wrote:
> >>>>>>> On Mon, Aug 19, 2024 at 9:09 PM Akhil P Oommen <quic_akhilpo@...cinc.com> wrote:
> >>>>>>>>
> >>>>>>>> On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco wrote:
> >>>>>>>>> This patch implements preemption feature for A6xx targets, this allows
> >>>>>>>>> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> >>>>>>>>> hardware as such supports multiple levels of preemption granularities,
> >>>>>>>>> ranging from coarse grained(ringbuffer level) to a more fine grained
> >>>>>>>>> such as draw-call level or a bin boundary level preemption. This patch
> >>>>>>>>> enables the basic preemption level, with more fine grained preemption
> >>>>>>>>> support to follow.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Sharat Masetty <smasetty@...eaurora.org>
> >>>>>>>>> Signed-off-by: Antonino Maniscalco <antomani103@...il.com>
> >>>>>>>>> ---
> >>>>>>>>
> >>>>>>>> No postamble packets which resets perfcounters? It is necessary. Also, I
> >>>>>>>> think we should disable preemption during profiling like we disable slumber.
> >>>>>>>>
> >>>>>>>> -Akhil.
> >>>>>>>
> >>>>>>> I don't see anything in kgsl which disables preemption during
> >>>>>>> profiling. It disables resetting perfcounters when doing system-wide
> >>>>>>> profiling, like freedreno, and in that case I assume preempting is
> >>>>>>> fine because the system profiler has a complete view of everything and
> >>>>>>> should "see" preemptions through the traces. For something like
> >>>>>>> VK_KHR_performance_query I suppose we'd want to disable preemption
> >>>>>>> because we disable saving/restoring perf counters, but that has to
> >>>>>>> happen in userspace because the kernel doesn't know what userspace
> >>>>>>> does.
> >>>>>>>
> >>>>>>
> >>>>>> KGSL does some sort of arbitration of perfcounter configurations and
> >>>>>> adds the select/enablement reg configuration as part of dynamic
> >>>>>> power up register list which we are not doing here. Is this something
> >>>>>> you are taking care of from userspace via preamble?
> >>>>>>
> >>>>>> -Akhil
> >>>>>
> >>>>> I don't think we have to take care of that in userspace, because Mesa
> >>>>> will always configure the counter registers before reading them in the
> >>>>> same submission, and if it gets preempted in the meantime then we're
> >>>>> toast anyways (due to not saving/restoring perf counters). kgsl sets
> >>>>> them from userspace, which is why it has to do something to set them
> >>>>
> >>>> Sorry, should be "kgsl sets them from the kernel".
> >>>>
> >>>>> after IFPC slumber or a context switch when the HW state is gone.
> >>>>> Also, because the upstream approach doesn't play nicely with system
> >>>>> profilers like perfetto, VK_KHR_performance_query is hidden by default
> >>>>> behind a debug flag in turnip. So there's already an element of "this
> >>>>> is unsupported, you have to know what you're doing to use it."
> >>>
> >>> But when you have composition on GPU enabled, there will be very frequent
> >>> preemption. And I don't know how usable profiling tools will be in that
> >>> case unless you disable preemption with a Mesa debug flag. But for that
> >>> to work, all existing submitqueues should be destroyed and recreated.
> >>>
> >>> So I was thinking that we can use the sysprof propertry to force L0
> >>> preemption from kernel.
> >>>
> >>> -Akhil.
> >>>
> >>
> >> Right but when using a system profiler I imagined the expectation would
> >> be to be able to understand how applications and compositor interact. An
> >> use case could be measuring latency and understanding what contributes
> >> to it. That is actually the main reason I added traces for preemption.
> >> Disabling preemption would make it less useful for this type of
> >> analysis. Did you have an use case in mind for a system profiler that
> >> would benefit from disabling preemption and that is not covered by
> >> VK_KHR_performance_query (or equivalent GL ext)?
> >
> > I would think that we want to generate an event, with GPU timestamp
> > (ie. RB_DONE) and which ring we are switching to, so that perfetto/etc
> > could display multiple GPU timelines and where the switch from one to
> > the other happens.
> >
> > I'm a bit curious how this is handled on android, with AGI/etc.. I
> > don't see any support in perfetto for this.
> >
> > BR,
> > -R
> >
> >> Best regards,
> >> --
> >> Antonino Maniscalco <antomani103@...il.com>
> >>
>
> Looking at KGSL they seem to use ftrace and I don't see it doing
> anything to get a timestamp from some GPU timer, really not sure how
> that would be put in a gpu timeline.
I suspect it would require some work on perfetto trace-processor. It
can ingest ftrace events (but those would end up being something
driver specific). Maybe with u_trace and some tracepoints in the
'ambles something could be done that would be more driver agnostic
(but idk if that would work for gpu's where preemption happens more
autonomously in the fw)
BR,
-R
> Best regards,
> --
> Antonino Maniscalco <antomani103@...il.com>
>
Powered by blists - more mailing lists