[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A3FF9D0.9040607@linux.vnet.ibm.com>
Date: Mon, 22 Jun 2009 14:38:24 -0700
From: Corey Ashford <cjashfor@...ux.vnet.ibm.com>
To: Ingo Molnar <mingo@...e.hu>
CC: eranian@...il.com, LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Robert Richter <robert.richter@....com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Paul Mackerras <paulus@...ba.org>,
Andi Kleen <andi@...stfloor.org>,
Maynard Johnson <mpjohn@...ibm.com>,
Carl Love <cel@...ibm.com>,
Corey J Ashford <cjashfor@...ibm.com>,
Philip Mucci <mucci@...s.utk.edu>,
Dan Terpstra <terpstra@...s.utk.edu>,
perfmon2-devel <perfmon2-devel@...ts.sourceforge.net>
Subject: Re: I.2 - Grouping
Ingo Molnar wrote:
>> 2/ Grouping
>>
>> By design, an event can only be part of one group at a time.
>> Events in a group are guaranteed to be active on the PMU at the
>> same time. That means a group cannot have more events than there
>> are available counters on the PMU. Tools may want to know the
>> number of counters available in order to group their events
>> accordingly, such that reliable ratios could be computed. It seems
>> the only way to know this is by trial and error. This is not
>> practical.
>
> Groups are there to support heavily constrained PMUs, and for them
> this is the only way, as there is no simple linear expression for
> how many counters one can load on the PMU.
>
> The ideal model to tooling is relatively independent PMU registers
> (counters) with little constraints - most modern CPUs meet that
> model.
>
> All the existing tooling (tools/perf/) operates on that model and
> this leads to easy programmability and flexible results. This model
> needs no grouping of counters.
>
> Could you please cite specific examples in terms of tools/perf/?
> What feature do you think needs to know more about constraints? What
> is the specific win in precision we could achieve via that?
An example of this is that a user wants to monitor 10 events, and we have four
counters to work with. Let's assume there is some mapping of events to counters
where you need only 3 groups to schedule the 10 events onto the PMU. If you
leave it to the kernel (and don't group the events from user space), depending
on the kernel's fast event scheduling algorithm, it may take 6 groups to get all
of the requested events counted. This leads to lower counts in the counters,
and more chance for the counters to miss event bursts, which leads to less
accurate scaled results.
Currently the PAPI substrate for PCL does do this partitioning using a very dumb
algorithm. But it could be improved, particularly if there was some better way
to get feedback from the kernel other than a "yes, these fit" or "no, these
don't fit". I'm not sure what that way would be, though. Perhaps an ioctl that
does a some sort of "dry scheduling" of events to groups in an optimal way.
This call would not need to lock any resources, and just use the kernel's
algorithm for event constraint checking.
To me, this is not a big issue, but some sort of better mechanism might be
considered for a future update.
--
Regards,
- Corey
Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
503-578-3507
cjashfor@...ibm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists