lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090623132506.GA32002@elte.hu>
Date:	Tue, 23 Jun 2009 15:25:06 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Brice Goglin <Brice.Goglin@...ia.fr>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>, paulus@...ba.org,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [perf] howto switch from pfmon


* Ingo Molnar <mingo@...e.hu> wrote:

> > I guess there are still a lot of things on the TODOlist but I'd 
> > like to understand a bit more where things are going. Sorry I 
> > didn't read all the archives about this, there are way too many 
> > of them recently :)
> 
> Yeah, there's indeed still a lot on the TODO list :-)
> 
> CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE is a Barcelona hardware event, 
> so if you know that it maps to raw ID 0x100000e0 then you can 
> always extend the events that 'perf' knows about via raw events:
> 
>  $ perf stat -e cycles -e instructions -e r1000ffe0 ./hackbench 10

Note, beyond using raw events, if you are interested in profiling 
out 'locality badness' of your app, you are probably quite well 
served with the default metrics on Barcelona as well:

 $ perf stat ~/hackbench  10
 Time: 0.205

  Performance counter stats for '/home/mingo/hackbench 10':

    2187.328436  task-clock-msecs     #      3.315 CPUs 
          54554  context-switches     #      0.025 M/sec
           1160  CPU-migrations       #      0.001 M/sec
          17755  page-faults          #      0.008 M/sec
     4995437535  cycles               #   2283.808 M/sec
     2150881875  instructions         #      0.431 IPC  
      644099534  cache-references     #    294.469 M/sec
        8516562  cache-misses         #      3.894 M/sec

    0.659895237  seconds time elapsed.

The cache-misses event is sufficiently well-represented to be 
meaningful to profile based on it. Raw DRAM access stats can be 
useful too - but they are generally layered much later and your app 
can hurt already flip-flopping its working set, without hitting too 
hard on the DRAM channels.

So perhaps 'cache-misses' is a good first-level approximation metric 
to measure and profile along. You can get a good 
(last-level-)cache-misses profile using the auto-freq counters:

  perf record -e cache-misses -F 10000 ./your-app

The '-F 10000' tells the kernel to do 10 KHz sampling of your-app, 
regardless of how frequent cache-misses are. The tools (perf report) 
will take the weight of events into account, so it's all 
well-normalized between the functions.

So you dont need to specify the 'sampling interval' by hand to get a 
sufficient number of samples, you just specify a sampling frequency 
- and the perfcounters subsystem takes care of the rest.

Also, your system wont over-sample nor under-sample if your workload 
idles around occasionally.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ