[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTi=G7-v3ysxK2wY_3f8TecbD6ZjKog@mail.gmail.com>
Date: Fri, 22 Apr 2011 11:41:03 +0200
From: Stephane Eranian <eranian@...gle.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Arnaldo Carvalho de Melo <acme@...radead.org>,
linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Lin Ming <ming.m.lin@...el.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, eranian@...il.com,
Arun Sharma <asharma@...com>
Subject: Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
On Fri, Apr 22, 2011 at 11:23 AM, Ingo Molnar <mingo@...e.hu> wrote:
>
> * Stephane Eranian <eranian@...gle.com> wrote:
>
>> On Fri, Apr 22, 2011 at 10:06 AM, Ingo Molnar <mingo@...e.hu> wrote:
>> >
>> > * Ingo Molnar <mingo@...e.hu> wrote:
>> >
>> >> This needs to be a *lot* more user friendly. Users do not want to type in
>> >> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile
>> >> era really.
>> >>
>> >> Unless there's proper generalized and human usable support i'm leaning
>> >> towards turning off the offcore user-space accessible raw bits for now, and
>> >> use them only kernel-internally, for the cache events.
>>
>> Generic cache events are a myth. They are not usable. I keep getting
>> questions from users because nobody knows what they are actually counting,
>> thus nobody knows how to interpret the counts. You cannot really hide the
>> micro-architecture if you want to make any sensible measurements.
>
> Well:
>
> aldebaran:~> perf stat --repeat 10 -e instructions -e L1-dcache-loads -e L1-dcache-load-misses -e LLC-misses ./hackbench 10
> Time: 0.125
> Time: 0.136
> Time: 0.180
> Time: 0.103
> Time: 0.097
> Time: 0.125
> Time: 0.104
> Time: 0.125
> Time: 0.114
> Time: 0.158
>
> Performance counter stats for './hackbench 10' (10 runs):
>
> 2,102,556,398 instructions # 0.000 IPC ( +- 1.179% )
> 843,957,634 L1-dcache-loads ( +- 1.295% )
> 130,007,361 L1-dcache-load-misses ( +- 3.281% )
> 6,328,938 LLC-misses ( +- 3.969% )
>
> 0.146160287 seconds time elapsed ( +- 5.851% )
>
> It's certainly useful if you want to get ballpark figures about cache behavior
> of an app and want to do comparisons.
>
What can you conclude from the above counts?
Are they good or bad? If they are bad, how do you go about fixing the app?
> There are inconsistencies in our generic cache events - but that's not really a
> reason to obcure their usage behind nonsensical microarchitecture-specific
> details.
>
The actual events are a reflection of the micro-architecture. They indirectly
describe how it works. It is not clear to me that you can really improve your
app without some exposure to the micro-architecture.
So if you want to have generic events, I am fine with this, but you should not
block access to actual events pretending they are useless. Some people are
certainly interested in using them and learning about the micro-architecture
of their processor.
> But i'm definitely in favor of making these generalized events more consistent
> across different CPU types. Can you list examples of inconsistencies that we
> should resolve? (and which you possibly consider impossible to resolve, right?)
>
To make generic events more uniform across processors, one would have to have
precise definitions as to what they are supposed to count. Once you
have that, then
we may have a better chance at finding consistent mappings for each processor.
I have not yet seen such definitions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists