lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 22 Apr 2011 09:50:07 -0700
From:	arun@...rma-home.net
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Stephane Eranian <eranian@...gle.com>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Lin Ming <ming.m.lin@...el.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>, eranian@...il.com,
	Arun Sharma <asharma@...com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [generalized cache events] Re: [PATCH 1/1] perf tools: Add
 missing user space support for config1/config2

On Fri, Apr 22, 2011 at 12:52:11PM +0200, Ingo Molnar wrote:
> 
> Using the generalized cache events i can run:
> 
>  $ perf stat --repeat 10 -e cycles:u -e instructions:u -e l1-dcache-loads:u -e l1-dcache-load-misses:u ./array
> 
>  Performance counter stats for './array' (10 runs):
> 
>          6,719,130 cycles:u                   ( +-   0.662% )
>          5,084,792 instructions:u           #      0.757 IPC     ( +-   0.000% )
>          1,037,032 l1-dcache-loads:u          ( +-   0.009% )
>          1,003,604 l1-dcache-load-misses:u    ( +-   0.003% )
> 
>         0.003802098  seconds time elapsed   ( +-  13.395% )
> 
> I consider that this is 'bad', because for almost every dcache-load there's a 
> dcache-miss - a 99% L1 cache miss rate!

One could argue that all you need is cycles and instructions. If there is an
expensive load, you'll see that the load instruction takes many cycles and
you can infer that it's a cache miss.

Questions app developers typically ask me:

* If I fix all my top 5 L3 misses how much faster will my app go?
* Am I bottlenecked on memory bandwidth?
* I have 4 L3 misses every 1000 instructions and 15 branch mispredicts per
  1000 instructions. Which one should I focus on?

It's hard to answer some of these without access to all events.
While your approach of having generic events for commonly used counters
might be useful for some use cases, I don't see why exposing all vendor
defined events is harmful.

A clear statement on the last point would be helpful.

 -Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ