linux-kernel - Re: [RFC] perf_events: support for uncore a.k.a. nest units

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4B566131.6020300@linux.vnet.ibm.com>
Date:	Tue, 19 Jan 2010 17:49:37 -0800
From:	Corey Ashford <cjashfor@...ux.vnet.ibm.com>
To:	Andi Kleen <andi@...stfloor.org>
CC:	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	Paul Mackerras <paulus@...ba.org>,
	Stephane Eranian <eranian@...glemail.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Xiao Guangrong <xiaoguangrong@...fujitsu.com>,
	Dan Terpstra <terpstra@...s.utk.edu>,
	Philip Mucci <mucci@...s.utk.edu>,
	Maynard Johnson <mpjohn@...ibm.com>, Carl Love <cel@...ibm.com>
Subject: Re: [RFC] perf_events: support for uncore a.k.a. nest units

On 1/19/2010 4:44 PM, Andi Kleen wrote:
> On Tue, Jan 19, 2010 at 11:41:01AM -0800, Corey Ashford wrote:
>> 4. How do you encode uncore events?
>> ----
>> Uncore events will need to be encoded in the config field of the
>> perf_event_attr struct using the existing PERF_TYPE_RAW encoding.  64 bits
>> are available in the config field, and that may be sufficient to support
>> events on most systems. However, due to  the proliferation and added
>> complexity of PMUs we envision, we might want to add another 64-bit config
>> (perhaps call it config_extra or config2) field to encode any extra
>> attributes that might be needed.  The exact encoding used, just as for the
>> current encoding for core events, will be on a per-arch and possibly
>> per-system basis.
>
> I don't think a raw hex number will scale anywhere. You'll need a human
> readable event list / sub event masks with help texts.
>
> Often uncore events have specific restrictions, and that needs
> to be enforced somewhere too.
>
> Doing that all in a clean way that is also usable
> by programs likely needs a lot more thinking.

I left out one critical detail here: I had in mind that we'd be using a library 
like libpfm for handling the issue of event names + attributes to raw code 
translation.  In fact, we are using libpfm today for this purpose in the 
PAPI/perf_events substrate implementation.

>
>
>> bits   field
>> ------ -----
>> 3..0   PMU number 0-15  /* specifies which of several identical PMUs being
>> addressed */
>> 7..4   core id 0-15
>> 8..8   node id 0-1
>> 11..9  chip id 0-7
>> 16..12 blade id 0-31
>> 23..17 rack id 0-128
>
> Such a compressed addressing scheme doesn't seem very future proof.
> e.g. core 4 bits for the core is already obsolete (see the "80 core chip" that
> was recently announced)

Agreed.  If the designer is very generous with the size of each field, it could 
hold up for quite awhile, but still there's a problem with relating these 
addresses to actual hardware.

>
>
>> probably put something together for a particular system.
>>
>> Addressing Option 2)
>>
>> Have the kernel create nodes for each uncore PMU in /sys/devices/system or
>> other pseudo file system, such as the existing  /proc/device-tree on Power
>> systems. /sys/devices/system or /proc/device-tree could be explored by the
>> user tool, and the user could then specify the path of the requested PMU
>> via a string which the kernel could interpret.  To be overly simplistic,
>> something like "/sys/devices/system/pmus/blade4/cpu0/vectorcopro1".  If we
>> settled on a common tree root to use, we could specify only the relative
>> path name, "blade4/cpu0/vectorcopro1".
>
> That's a more workable scheme, but you still need to find a clean
> way to describe topology (see above). The existing examples in sysfs
> are unfortuately all clumpsy imho.
>

Yes, I agree.  Also it's easy to construct a system design that doesn't have a 
hierarchical topology.  A simple example would be a cluster of 32 nodes, each of 
which is connected to its 31 neighbors.  Perhaps for the purposes of just 
enumerating PMUs, a tree might be sufficient, but it's not clear to me that it 
is mathematically sufficient for all topologies, not to mention if it's 
intuitive enough to use.  For example, highly-interconnected components might 
require that PMU leaf nodes be duplicated in multiple branches, i.e. PMU paths 
might not be unique in some topologies.

I'm certainly open to better alternatives!


Thanks for your thoughts,

- Corey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/