lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100708111936.GA5926@elte.hu>
Date:	Thu, 8 Jul 2010 13:19:36 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Matt Fleming <matt@...sole-pimps.org>,
	Will Deacon <will.deacon@....com>, paulus <paulus@...ba.org>,
	stephane eranian <eranian@...glemail.com>,
	Robert Richter <robert.richter@....com>,
	Paul Mundt <lethal@...ux-sh.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Cyrill Gorcunov <gorcunov@...il.com>,
	Lin Ming <ming.m.lin@...el.com>,
	Yanmin <yanmin_zhang@...ux.intel.com>,
	Deng-Cheng Zhu <dengcheng.zhu@...il.com>,
	David Miller <davem@...emloft.net>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH 00/11] perf pmu interface -v2


* Peter Zijlstra <peterz@...radead.org> wrote:

> On Thu, 2010-07-01 at 17:39 +0200, Peter Zijlstra wrote:
> > 
> > Ah, for sampling for sure, simply group a software perf event and a
> > hardware perf event together and use PERF_SAMPLE_READ. 
> 
> So the idea is to sample using a software event (periodic timer of sorts, 
> maybe randomize it) and weight its samples by the hardware event deltas.
> 
> Suppose you have a workload consisting of two main parts:
> 
>   my_important_work()
>   {
>      load_my_data();
>      compute_me_silly();
>   }
> 
> Now, lets assume that both these functions take the same time to complete 
> for each part of work. In that case a periodic timer generate samples that 
> are about 50/50 distributed between these two functions.
> 
> Now, let us further assume that load_my_data() is so slow because its 
> missing all the caches and compute_me_silly() is slow because its defeating 
> the branch predictor.
> 
> So what we want to end up with, is that when we sample for cache-misses we 
> get load_my_data() as the predominant function, not a nice 50/50 relation. 
> Idem for branch misses and compute_me_silly().
> 
> By weighting the samples by the hw counter delta we get this, if we assume 
> that the sampling frequency is not a harmonic of the runtime of these 
> functions, then statistics will dtrt.

Yes.

And if the platform code implements this then the tooling side already takes 
care of it - even if the CPU itself cannot geneate interrupts based on say 
cachemisses or branches (but can measure them via counts).

The only situation where statistics will not do the right thing is when the 
likelyhood of the sample tick significantly correlates with the likelyhood of 
the workload itself executing. Timer-dominated workloads would be an example.

Real hrtimers are sufficiently tick-less to solve most of these artifacts in 
practice.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ