[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18869.40904.382716.827097@cargo.ozlabs.ibm.com>
Date: Tue, 10 Mar 2009 10:01:28 +1100
From: Paul Mackerras <paulus@...ba.org>
To: Robert Richter <robert.richter@....com>
Cc: Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Stephane Eranian <eranian@...glemail.com>,
Eric Dumazet <dada1@...mosbay.com>,
Arjan van de Ven <arjan@...radead.org>,
Peter Anvin <hpa@...or.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
"David S. Miller" <davem@...emloft.net>,
Mike Galbraith <efault@....de>
Subject: Re: [announce] Performance Counters for Linux, v6
Robert Richter writes:
> Some points to mention here. This patch set actually introduces two
> interfaces, a new user/kernel interface and an in-kernel api to access
> performance counters. These are separate things and sometimes mixed
> too much. There is a strong need for an in-kernel api. This is the
We have been concentrating more on the user/kernel API since that is
the one that cannot be changed in an incompatible way once this stuff
goes upstream. The in-kernel API can be changed at any time and is
still evolving.
> third implementation I am involved (oprofile, perfmon are the others)
> and the things are always the same way. All these subsystems should be
> merged to one in-kernel implemenation and share the same code. The
> different user/kernel i/fs could then coexist and meet the users
> different needs.
It would certainly be good to get oprofile to use the same low-level
machinery as perf_counters. I'm not sure what the fate of perfmon
will be, but it seems unlikely it will go upstream in anything like
its present form.
> > +static const int intel_perfmon_event_map[] =
> > +{
> > + [PERF_COUNT_CPU_CYCLES] = 0x003c,
> > + [PERF_COUNT_INSTRUCTIONS] = 0x00c0,
> > + [PERF_COUNT_CACHE_REFERENCES] = 0x4f2e,
> > + [PERF_COUNT_CACHE_MISSES] = 0x412e,
> > + [PERF_COUNT_BRANCH_INSTRUCTIONS] = 0x00c4,
> > + [PERF_COUNT_BRANCH_MISSES] = 0x00c5,
> > + [PERF_COUNT_BUS_CYCLES] = 0x013c,
> > +};
>
> I would like to define _all_ the behaviour of the architecture and the
> models in functions instead of parameters and lists. It is hard to
> explain why, because it is more esthetics, but I believe, only nice
> things work best. Let me try.
>
> 1) The list above seems to be random, there are lots of events and it
> is hard to define, which event is really important. Surely these
> events are important, but it is hard to draw a line here.
I see that list as a convenience for doing a few simple performance
measurements. For any serious in-depth analysis userspace will know
what processor it's running on and use raw event codes.
> 2) The list assumes/implies the events are available on all
> architectures and cpus. This is probably not the case, and also, the
> existence of an event must not be _important_ for a certain
> architecture. But it has to be there even if it is of no interest.
>
> 3) Hard to extend. If an event is added here this could have impact to
> all other architectures. Data structures are changing.
>
> 4) In the kernel the behaviour of a subsystem is offen implemented by
> functions (e.g. struct device_driver). There are lots of ops structs
> in the kernel and there are reasons for it.
>
> 5) ops structs are more dynamic. The data could be generated
> dynamically and does not have to be static in some tables and
> variables.
>
> So, instead of making the list a public data structure, better pass
> the type to an arch specific function, e.g.:
>
> int arch_xxx_setup_event(int event_type);
That's exactly what we have, except that it's called
hw_perf_counter_init and the event_type you have there is in the
struct perf_counter that gets passed in.
> If the type is not supported, an error could be returned. There is no
> more impact. Even the binaries of the builds would be identically if
> hw_event_types would be extended for a single different architecture.
>
> The same applies also for counters and so on, better implement
> functions.
All of that is already done; hw_perf_counter_init gets to interpret
the counter->hw_event.type and counter->hw_event.raw fields and decide
whether the event is supported, and return an error if not. On x86 it
looks like there is a further ops structure (struct pmc_x86_ops) which
allows each x86-compatible cpu type to supply its own functions for
doing the interpretation of counter->hw_event and other things.
Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists