linux-kernel - Re: [RFC][PATCH 00/11] perf pmu interface -v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1278587622.1900.79.camel@laptop>
Date:	Thu, 08 Jul 2010 13:13:42 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Matt Fleming <matt@...sole-pimps.org>
Cc:	Will Deacon <will.deacon@....com>, paulus <paulus@...ba.org>,
	stephane eranian <eranian@...glemail.com>,
	Robert Richter <robert.richter@....com>,
	Paul Mundt <lethal@...ux-sh.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Cyrill Gorcunov <gorcunov@...il.com>,
	Lin Ming <ming.m.lin@...el.com>,
	Yanmin <yanmin_zhang@...ux.intel.com>,
	Deng-Cheng Zhu <dengcheng.zhu@...il.com>,
	David Miller <davem@...emloft.net>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH 00/11] perf pmu interface -v2

On Thu, 2010-07-01 at 17:39 +0200, Peter Zijlstra wrote:
> 
> Ah, for sampling for sure, simply group a software perf event and a
> hardware perf event together and use PERF_SAMPLE_READ. 

So the idea is to sample using a software event (periodic timer of
sorts, maybe randomize it) and weight its samples by the hardware event
deltas.

Suppose you have a workload consisting of two main parts:

  my_important_work()
  {
     load_my_data();
     compute_me_silly();
  }

Now, lets assume that both these functions take the same time to
complete for each part of work. In that case a periodic timer generate
samples that are about 50/50 distributed between these two functions.

Now, let us further assume that load_my_data() is so slow because its
missing all the caches and compute_me_silly() is slow because its
defeating the branch predictor.

So what we want to end up with, is that when we sample for cache-misses
we get load_my_data() as the predominant function, not a nice 50/50
relation. Idem for branch misses and compute_me_silly().

By weighting the samples by the hw counter delta we get this, if we
assume that the sampling frequency is not a harmonic of the runtime of
these functions, then statistics will dtrt.

It basically generates a massive skid on the sample, but as long as most
of the samples end up hitting the right function we're good. For a
periodic workload like: 
  while (lots) { my_important_work() }
that is even true for period > function_runtime with the exception of
that harmonic thing. For less neat workloads like:
  while (lots) { my_important_work(); other_random_things(); }
This doesn't need to work unless period < function_runtime.

Clearly we cannot attribute anything to the actual instruction hit due
to the massive skid, but we can (possibly) say something about the
function based on these statistical rules.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/