[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20141116092737.GA19043@gmail.com>
Date: Sun, 16 Nov 2014 10:27:37 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Robert Bragg <robert@...bynine.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, Paul Mackerras <paulus@...ba.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Daniel Vetter <daniel.vetter@...ll.ch>,
Chris Wilson <chris@...is-wilson.co.uk>,
Rob Clark <robdclark@...il.com>,
Samuel Pitoiset <samuel.pitoiset@...il.com>,
Ben Skeggs <bskeggs@...hat.com>
Subject: Re: [RFC PATCH 0/3] Expose gpu counters via perf pmu driver
* Robert Bragg <robert@...bynine.org> wrote:
> > I'd strong[ly] suggest thinking about sampling as well, if
> > the hardware exposes sample information: at least for
> > profiling CPU loads the difference is like day and night,
> > compared to aggregated counts and self-profiling.
>
> Here I was thinking of counters or data that can be sampled via
> mmio using a hrtimer. E.g. the current gpu frequency or the
> energy usage. I'm not currently aware of any capability for the
> gpu to say trigger an interrupt after a threshold number of
> events occurs (like clock cycles) so I think we may generally
> be limited to a wall clock time domain for sampling.
In general hrtimer-driven polling gives pretty good profiling
information as well - key is to be able to get a sample of EU
thread execution state.
(Trigger thresholds and so can be useful as well, but are a
second order concern in terms of profiling quality.)
> > It's a very good idea to not expose such limitations to
> > user-space - the GPU driver doing the necessary hrtimer
> > polling to construct a proper count is a much higher quality
> > solution.
>
> That sounds preferable.
>
> I'm open to suggestions for finding another way for userspace
> to initiate a flush besides through read() in case there's a
> concern that might be set a bad precedent. For the i915_oa
> driver it seems ok at the moment since we don't currently
> report a useful counter through read() and for the main use
> case where we want the flushing we expect that most of the time
> there won't be any significant cost involved in flushing since
> we'll be using a very low timer period. Maybe this will bite us
> later though.
You could add an ioctl() as well - we are not religious about
them, there's always things that are special enough to not
warrant a generic syscall.
Anyway, aggregate counts alone are obviously very useful to
analyzing GPU performance, so your initial approach looks
perfectly acceptable to me already.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists