[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100708111936.GA5926@elte.hu>
Date: Thu, 8 Jul 2010 13:19:36 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Matt Fleming <matt@...sole-pimps.org>,
Will Deacon <will.deacon@....com>, paulus <paulus@...ba.org>,
stephane eranian <eranian@...glemail.com>,
Robert Richter <robert.richter@....com>,
Paul Mundt <lethal@...ux-sh.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Cyrill Gorcunov <gorcunov@...il.com>,
Lin Ming <ming.m.lin@...el.com>,
Yanmin <yanmin_zhang@...ux.intel.com>,
Deng-Cheng Zhu <dengcheng.zhu@...il.com>,
David Miller <davem@...emloft.net>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH 00/11] perf pmu interface -v2
* Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, 2010-07-01 at 17:39 +0200, Peter Zijlstra wrote:
> >
> > Ah, for sampling for sure, simply group a software perf event and a
> > hardware perf event together and use PERF_SAMPLE_READ.
>
> So the idea is to sample using a software event (periodic timer of sorts,
> maybe randomize it) and weight its samples by the hardware event deltas.
>
> Suppose you have a workload consisting of two main parts:
>
> my_important_work()
> {
> load_my_data();
> compute_me_silly();
> }
>
> Now, lets assume that both these functions take the same time to complete
> for each part of work. In that case a periodic timer generate samples that
> are about 50/50 distributed between these two functions.
>
> Now, let us further assume that load_my_data() is so slow because its
> missing all the caches and compute_me_silly() is slow because its defeating
> the branch predictor.
>
> So what we want to end up with, is that when we sample for cache-misses we
> get load_my_data() as the predominant function, not a nice 50/50 relation.
> Idem for branch misses and compute_me_silly().
>
> By weighting the samples by the hw counter delta we get this, if we assume
> that the sampling frequency is not a harmonic of the runtime of these
> functions, then statistics will dtrt.
Yes.
And if the platform code implements this then the tooling side already takes
care of it - even if the CPU itself cannot geneate interrupts based on say
cachemisses or branches (but can measure them via counts).
The only situation where statistics will not do the right thing is when the
likelyhood of the sample tick significantly correlates with the likelyhood of
the workload itself executing. Timer-dominated workloads would be an example.
Real hrtimers are sufficiently tick-less to solve most of these artifacts in
practice.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists