[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1278587622.1900.79.camel@laptop>
Date: Thu, 08 Jul 2010 13:13:42 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Matt Fleming <matt@...sole-pimps.org>
Cc: Will Deacon <will.deacon@....com>, paulus <paulus@...ba.org>,
stephane eranian <eranian@...glemail.com>,
Robert Richter <robert.richter@....com>,
Paul Mundt <lethal@...ux-sh.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Cyrill Gorcunov <gorcunov@...il.com>,
Lin Ming <ming.m.lin@...el.com>,
Yanmin <yanmin_zhang@...ux.intel.com>,
Deng-Cheng Zhu <dengcheng.zhu@...il.com>,
David Miller <davem@...emloft.net>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH 00/11] perf pmu interface -v2
On Thu, 2010-07-01 at 17:39 +0200, Peter Zijlstra wrote:
>
> Ah, for sampling for sure, simply group a software perf event and a
> hardware perf event together and use PERF_SAMPLE_READ.
So the idea is to sample using a software event (periodic timer of
sorts, maybe randomize it) and weight its samples by the hardware event
deltas.
Suppose you have a workload consisting of two main parts:
my_important_work()
{
load_my_data();
compute_me_silly();
}
Now, lets assume that both these functions take the same time to
complete for each part of work. In that case a periodic timer generate
samples that are about 50/50 distributed between these two functions.
Now, let us further assume that load_my_data() is so slow because its
missing all the caches and compute_me_silly() is slow because its
defeating the branch predictor.
So what we want to end up with, is that when we sample for cache-misses
we get load_my_data() as the predominant function, not a nice 50/50
relation. Idem for branch misses and compute_me_silly().
By weighting the samples by the hw counter delta we get this, if we
assume that the sampling frequency is not a harmonic of the runtime of
these functions, then statistics will dtrt.
It basically generates a massive skid on the sample, but as long as most
of the samples end up hitting the right function we're good. For a
periodic workload like:
while (lots) { my_important_work() }
that is even true for period > function_runtime with the exception of
that harmonic thing. For less neat workloads like:
while (lots) { my_important_work(); other_random_things(); }
This doesn't need to work unless period < function_runtime.
Clearly we cannot attribute anything to the actual instruction hit due
to the massive skid, but we can (possibly) say something about the
function based on these statistical rules.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists