[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTikNWgzKefWxp6UUqQ-XdBvgG+QSNQ@mail.gmail.com>
Date: Fri, 22 Apr 2011 10:47:40 +0200
From: Stephane Eranian <eranian@...gle.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Arnaldo Carvalho de Melo <acme@...radead.org>,
linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Lin Ming <ming.m.lin@...el.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, eranian@...il.com,
Arun Sharma <asharma@...com>
Subject: Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
On Fri, Apr 22, 2011 at 10:06 AM, Ingo Molnar <mingo@...e.hu> wrote:
>
> * Ingo Molnar <mingo@...e.hu> wrote:
>
>> This needs to be a *lot* more user friendly. Users do not want to type in
>> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile
>> era really.
>>
>> Unless there's proper generalized and human usable support i'm leaning
>> towards turning off the offcore user-space accessible raw bits for now, and
>> use them only kernel-internally, for the cache events.
>
Generic cache events are a myth. They are not usable. I keep getting questions
from users because nobody knows what they are actually counting, thus nobody
knows how to interpret the counts. You cannot really hide the micro-architecture
if you want to make any sensible measurements.
I agree with the poor usability of perf when you have to pass hex
values for events.
But that's why I have a user level library to map event strings to
event codes for perf.
Arun Sharma posted a patch a while ago to connect this library with perf, so far
it's been ignored, it seems:
perf stat -e offcore_response_0:dmd_data_rd foo
> I'm about to push out the patch attached below - it lays out the arguments in
> detail. I don't think we have time to fix this properly for .39 - but memory
> profiling could be a nice feature for v2.6.40.
>
You will not be able to do any reasonable memory profiling using
offcore response
events. Dont' expect a profile to point to the missing loads. If
you're lucky it would
point to the use instruction.
> --------------------->
> From b52c55c6a25e4515b5e075a989ff346fc251ed09 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@...e.hu>
> Date: Fri, 22 Apr 2011 08:44:38 +0200
> Subject: [PATCH] x86, perf event: Turn off unstructured raw event access to offcore registers
>
> Andi Kleen pointed out that the Intel offcore support patches were merged
> without user-space tool support to the functionality:
>
> |
> | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
> | user space bits were not. This made it impossible to set the extra mask
> | and actually do the OFFCORE profiling
> |
>
> Andi submitted a preliminary patch for user-space support, as an
> extension to perf's raw event syntax:
>
> |
> | Some raw events -- like the Intel OFFCORE events -- support additional
> | parameters. These can be appended after a ':'.
> |
> | For example on a multi socket Intel Nehalem:
> |
> | perf stat -e r1b7:20ff -a sleep 1
> |
> | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
> | that measures any access to DRAM on another socket.
> |
>
> But this kind of usability is absolutely unacceptable - users should not
> be expected to type in magic, CPU and model specific incantations to get
> access to useful hardware functionality.
>
> The proper solution is to expose useful offcore functionality via
> generalized events - that way users do not have to care which specific
> CPU model they are using, they can use the conceptual event and not some
> model specific quirky hexa number.
>
> We already have such generalization in place for CPU cache events,
> and it's all very extensible.
>
> "Offcore" events measure general DRAM access patters along various
> parameters. They are particularly useful in NUMA systems.
>
> We want to support them via generalized DRAM events: either as the
> fourth level of cache (after the last-level cache), or as a separate
> generalization category.
>
> That way user-space support would be very obvious, memory access
> profiling could be done via self-explanatory commands like:
>
> perf record -e dram ./myapp
> perf record -e dram-remote ./myapp
>
> ... to measure DRAM accesses or more expensive cross-node NUMA DRAM
> accesses.
>
> These generalized events would work on all CPUs and architectures that
> have comparable PMU features.
>
> ( Note, these are just examples: actual implementation could have more
> sophistication and more parameter - as long as they center around
> similarly simple usecases. )
>
> Now we do not want to revert *all* of the current offcore bits, as they
> are still somewhat useful for generic last-level-cache events, implemented
> in this commit:
>
> e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere
>
> But we definitely do not yet want to expose the unstructured raw events
> to user-space, until better generalization and usability is implemented
> for these hardware event features.
>
> ( Note: after generalization has been implemented raw offcore events can be
> supported as well: there can always be an odd event that is marginally
> useful but not useful enough to generalize. DRAM profiling is definitely
> *not* such a category so generalization must be done first. )
>
> Furthermore, PERF_TYPE_RAW access to these registers was not intended
> to go upstream without proper support - it was a side-effect of the above
> e994d7d23a0b commit, not mentioned in the changelog.
>
> As v2.6.39 is nearing release we go for the simplest approach: disable
> the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
> kernel and becomes an ABI.
>
> Once proper structure is implemented for these hardware events and users
> are offered usable solutions we can revisit this issue.
>
> Reported-by: Andi Kleen <ak@...ux.intel.com>
> Acked-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> Cc: Arnaldo Carvalho de Melo <acme@...hat.com>
> Cc: Frederic Weisbecker <fweisbec@...il.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
> Signed-off-by: Ingo Molnar <mingo@...e.hu>
> ---
> arch/x86/kernel/cpu/perf_event.c | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index eed3673a..632e5dc 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -586,8 +586,12 @@ static int x86_setup_perfctr(struct perf_event *event)
> return -EOPNOTSUPP;
> }
>
> + /*
> + * Do not allow config1 (extended registers) to propagate,
> + * there's no sane user-space generalization yet:
> + */
> if (attr->type == PERF_TYPE_RAW)
> - return x86_pmu_extra_regs(event->attr.config, event);
> + return 0;
>
> if (attr->type == PERF_TYPE_HW_CACHE)
> return set_ext_hw_attr(hwc, event);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists