[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1108120009560.12863@cl320.eecs.utk.edu>
Date: Fri, 12 Aug 2011 00:35:33 -0400
From: Vince Weaver <vweaver1@...s.utk.edu>
To: Ingo Molnar <mingo@...e.hu>
CC: Will Deacon <will.deacon@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
sam wang <linux.swang@...il.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Paul Mackerras <paulus@...ba.org>,
Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
Stephane Eranian <eranian@...il.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"David S. Miller" <davem@...emloft.net>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [patch] perf: ARMv7 wrong "branches" generalized instruction
On Thu, 11 Aug 2011, Ingo Molnar wrote:
>
> * Will Deacon <will.deacon@....com> wrote:
> So what you and Vince are suggesting, to dumb down the kernel parts
> of perf and force users into raw or microarchitecture specific events
> actually *reduces* the user-base very significantly - while in
> practice even just cycles, instructions and branches level analysis
> handles 99% of the everyday performance analysis needs ...
No, what I want are the generalized events to accurately describe what is
being measured.
To do this properly you need a lot more granularity. PAPI for example has
100+ generalized events, some of which require multiple hardware events
to be combined to get the desired outcome.
Having a "branch" event like that on ARM that ignores not-taken events is
going to drive you nuts when you are trying to sample in your code to
find out why the branch miss rate is so high and you want to find the loop
exits that are predicted poorly (but can't find them because loop exits
tend to be not-taken branches).
Having a "L1-dcache-load" event that includes stores (like on current AMD)
will drive you nuts if you are debugging code and it shows that somehow
these loads are triggering cache-coherency invalidates when you know that
usually only stores can do so, and why would a load only event count
stores.
Having a tool that gives misleading names to things would be like if I
gave some poor user a copy of gdb that silently set breakpoints a random
offset from where the actueal breakpoint. Sure it probably correlates to
where you want the code to stop, but it's not what the tool says it is
doing..
So either come up with finer-grained generalized events, or else do a
better job of picking them. The fact that my _extremely_ simple
validation tests keep turning up problems like this indicate that no one
bothered checking these events before they shoved them in the kernel.
Once in, these bad events linger for years and it's not even possible
to tell from userspace what raw even a generalized event maps to.
So anyone who cares is going to use a tool that uses raw events anyway.
(Note I am not saying they'll calculate the raw hex vaue themselves.
They'll use a pre-existing tool written by your maligned 1% power
users that will pick the proper raw event for them.)
> We saw how the "push CPU specific events to users and tooling"
> concept didn't work with oprofile - why do we have to re-discuss this
> part of failed Linux history again and again?
No one is arguing for oprofile. libpfm4/PAPI or a tool like pfmon, sure.
> The approach Vince and you are suggesting is literally sacrificing
> 99% of utility for 1% of the users - a not very smart approach. I
> don't mind accomodating the needs of 1% of power-users (at all), but:
>
> *NOT AT THE EXPENSE OF THE COMMON CASE*.
>
> [...]
>
> We literally have more than 7 million lines of drivers/* code that
> provides generic abstractions - not just a few thousand lines of raw
> PCI operations space where user-space can write magic values to ...
>
> [...]
>
> It's the *job* of the kernel to abstract things away, we don't shy
> away from that ...
I get the impression if you were the graphics maintainer you'd specify all
drivers should use a 1024x768x16bpp generic abstraction and dither or
scale all devices to match this. This would be a nice abstraction that
would make graphics programming oh so much easier for the casual
programmer and it provides 99% of what most users want. The 1% of power
users are unimportant.
Also one could only use officialy blessed list of colors appearing in
some obscure file under /sys.
Also access to 3D functionality would be blocked until the people wanting
3D had properly developed a generic concept of the color "mauve" that
could be applied across the board, even on black+white only hardware.
> So having generic events is not some fancy, unused property, but a
> pretty important measurement aspect of perf.
perf the userspace tool or perf_event the kernel ABI?
Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists