[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B57907E.5000207@linux.vnet.ibm.com>
Date: Wed, 20 Jan 2010 15:23:42 -0800
From: Corey Ashford <cjashfor@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
Andi Kleen <andi@...stfloor.org>,
Paul Mackerras <paulus@...ba.org>,
Stephane Eranian <eranian@...glemail.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Xiao Guangrong <xiaoguangrong@...fujitsu.com>,
Dan Terpstra <terpstra@...s.utk.edu>,
Philip Mucci <mucci@...s.utk.edu>,
Maynard Johnson <mpjohn@...ibm.com>, Carl Love <cel@...ibm.com>
Subject: Re: [RFC] perf_events: support for uncore a.k.a. nest units
On 1/20/2010 1:33 PM, Peter Zijlstra wrote:
> On Wed, 2010-01-20 at 14:34 +0100, Peter Zijlstra wrote:
>
>> So how about PERF_TYPE_{CORE,NODE,SOCKET} like things?
>
> OK, so I read most of the intel uncore stuff, and it seems to suggest
> you need a regular pmu event to receive uncore events (chained setup),
> this seems rather retarded since it wastes a perfectly good pmu event
> and makes configuring all this more intricate...
>
> A well, nothing to be done about that I guess..
Yes, we have a similar situation where in addition to events that are counted on
core PMU counters, we also have counters that are off-core; in some cases the
counters are in off-core units which take their actual events from other
off-core units, in addition to their own events. So you can see that this can
be almost arbitrarily complex.
As for the PERF_TYPE_(CORE,NODE,SOCKET) idea, that could still work, even
though, for example, a socket event may be counted on a core PMU. Using more
encodings for the type field, as you've suggested, would allow us to reuse the
64-bit config space multiple times. Were you thinking that with the type field
we'd still re-use the "cpu" argument for the actual pmu address within the
PERF_TYPE_* space? If so, that's an interesting idea, but I think it still
leaves open the problem of how to actually relate those address to the real
hardware, especially in the case of using a hypervisor which has provided you a
small subset of the physical hardware in the system.
I really think we need some sort of data structure which is passed from the
kernel to user space to represent the topology of the system, and give useful
information to be able to identify each PMU node. Whether this is done with a
sysfs-style tree, a table in a file, XML, etc... it doesn't really matter much,
but it needs to be something that can be parsed relatively easily and *contains
just enough information* for the user to be able to correctly choose PMUs, and
for the kernel to be able to relate that back to actual PMU hardware.
In our case, we are looking at /proc/device-tree, and it actually does appear to
contain enough information for us. However, since /proc/device-tree is not
available anywhere but Power arch (/proc/device-tree originates from a data
structure passed into the OS from the Open Firmware) we'd like to have a more
general approach that can be used on x86 and other arches.
- Corey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists