[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1112231507120.26100@pianoman.cluster.toy>
Date: Fri, 23 Dec 2011 15:12:40 -0500 (EST)
From: Vince Weaver <vince@...ter.net>
To: Ingo Molnar <mingo@...e.hu>
cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Vince Weaver <vweaver1@...s.utk.edu>,
William Cohen <wcohen@...hat.com>,
Stephane Eranian <eranian@...gle.com>,
Arun Sharma <asharma@...com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH 0/6] perf: x86 RDPMC and RDTSC support
On Wed, 21 Dec 2011, Ingo Molnar wrote:
> Here's "pinned events" variant i've measured:
>
> static u64 mmap_read_self(void *addr)
> {
> struct perf_event_mmap_page *pc = addr;
> u32 seq, idx;
> u64 count;
>
> do {
> seq = pc->lock;
> barrier();
>
> idx = pc->index;
> count = pc->offset;
> if (idx)
> count += rdpmc(idx - 1);
>
> barrier();
> } while (pc->lock != seq);
>
> return count;
> }
currently you need to do at least two rdpmc() calls when doing a
start/read/stop (I use this as a benchmark as it's what PAPI code commonly
does).
This is because the pc->offset value isn't initalized to 0 on start,
but to max_period & cntrval_mask.
I'm not sure what perf_event can do about this short of having a separate
field in the mmap structure that doesn't have the overflow offset
considerations.
As an aside, I notice that the internal perf_event read() routine on x86
seems to use rdmsrl() instead of the equivelent rdpmc(). From what I
understand, at least through core2 (and maybe later) rdpmc() is faster
than the equivelent rdmsr() call. I'm not sure if would be worth
replacing the calls though.
Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists