linux-kernel - Re: [RFC][PATCH 0/6] perf: x86 RDPMC and RDTSC support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.02.1112231507120.26100@pianoman.cluster.toy>
Date:	Fri, 23 Dec 2011 15:12:40 -0500 (EST)
From:	Vince Weaver <vince@...ter.net>
To:	Ingo Molnar <mingo@...e.hu>
cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Vince Weaver <vweaver1@...s.utk.edu>,
	William Cohen <wcohen@...hat.com>,
	Stephane Eranian <eranian@...gle.com>,
	Arun Sharma <asharma@...com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH 0/6] perf: x86 RDPMC and RDTSC support

On Wed, 21 Dec 2011, Ingo Molnar wrote:

> Here's "pinned events" variant i've measured:
> 
> static u64 mmap_read_self(void *addr)
> {
>         struct perf_event_mmap_page *pc = addr;
>         u32 seq, idx;
>         u64 count;
> 
>         do {
>                 seq = pc->lock;
>                 barrier();
> 
>                 idx = pc->index;
>                 count = pc->offset;
>                 if (idx)
>                         count += rdpmc(idx - 1);
> 
>                 barrier();
>         } while (pc->lock != seq);
> 
>         return count;
> }

currently you need to do at least two rdpmc() calls when doing a 
start/read/stop (I use this as a benchmark as it's what PAPI code commonly 
does).

This is because the pc->offset value isn't initalized to 0 on start,
but to max_period & cntrval_mask.

I'm not sure what perf_event can do about this short of having a separate
field in the mmap structure that doesn't have the overflow offset 
considerations.

As an aside, I notice that the internal perf_event read() routine on x86 
seems to use rdmsrl() instead of the equivelent rdpmc().  From what I 
understand, at least through core2 (and maybe later) rdpmc() is faster 
than the equivelent rdmsr() call.  I'm not sure if would be worth
replacing the calls though.

Vince

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/