[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110422092322.GA1948@elte.hu>
Date: Fri, 22 Apr 2011 11:23:22 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Stephane Eranian <eranian@...gle.com>
Cc: Arnaldo Carvalho de Melo <acme@...radead.org>,
linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Lin Ming <ming.m.lin@...el.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, eranian@...il.com,
Arun Sharma <asharma@...com>
Subject: Re: [PATCH 1/1] perf tools: Add missing user space support for
config1/config2
* Stephane Eranian <eranian@...gle.com> wrote:
> On Fri, Apr 22, 2011 at 10:06 AM, Ingo Molnar <mingo@...e.hu> wrote:
> >
> > * Ingo Molnar <mingo@...e.hu> wrote:
> >
> >> This needs to be a *lot* more user friendly. Users do not want to type in
> >> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile
> >> era really.
> >>
> >> Unless there's proper generalized and human usable support i'm leaning
> >> towards turning off the offcore user-space accessible raw bits for now, and
> >> use them only kernel-internally, for the cache events.
>
> Generic cache events are a myth. They are not usable. I keep getting
> questions from users because nobody knows what they are actually counting,
> thus nobody knows how to interpret the counts. You cannot really hide the
> micro-architecture if you want to make any sensible measurements.
Well:
aldebaran:~> perf stat --repeat 10 -e instructions -e L1-dcache-loads -e L1-dcache-load-misses -e LLC-misses ./hackbench 10
Time: 0.125
Time: 0.136
Time: 0.180
Time: 0.103
Time: 0.097
Time: 0.125
Time: 0.104
Time: 0.125
Time: 0.114
Time: 0.158
Performance counter stats for './hackbench 10' (10 runs):
2,102,556,398 instructions # 0.000 IPC ( +- 1.179% )
843,957,634 L1-dcache-loads ( +- 1.295% )
130,007,361 L1-dcache-load-misses ( +- 3.281% )
6,328,938 LLC-misses ( +- 3.969% )
0.146160287 seconds time elapsed ( +- 5.851% )
It's certainly useful if you want to get ballpark figures about cache behavior
of an app and want to do comparisons.
There are inconsistencies in our generic cache events - but that's not really a
reason to obcure their usage behind nonsensical microarchitecture-specific
details.
But i'm definitely in favor of making these generalized events more consistent
across different CPU types. Can you list examples of inconsistencies that we
should resolve? (and which you possibly consider impossible to resolve, right?)
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists