lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 23 Oct 2014 13:42:28 +0200
From:	Peter Zijlstra <>
To:	Vince Weaver <>
Cc:	Andy Lutomirski <>,
	Valdis Kletnieks <>,
	"" <>,
	Paul Mackerras <>,
	Arnaldo Carvalho de Melo <>,
	Ingo Molnar <>,
	Kees Cook <>,
	Andrea Arcangeli <>,
	Erik Bosman <>
Subject: Re: [RFC 0/5] CR4 handling improvements

On Tue, Oct 21, 2014 at 01:05:49PM -0400, Vince Weaver wrote:
> On Tue, 21 Oct 2014, Peter Zijlstra wrote:
> > > perf_event is also fairly high overhead for setting up and starting 
> > > events,
> > 
> > Which you only do once at the start, so is that really a problem?
> There are various reasons why you might want to start events at times
> other than the beginning of the program.  Some people don't like kernel 
> multiplexing so they start/stop manually if they want to switch eventsets.

I suppose you could pre-create all events and use ioctl()s to start/stop
them where/when desired, this should be faster I think. But yes, this is
not a use-case I've though much about.

> But no, I suppose you could ask anyone wanting to use rdpmc to open some 
> sort of dummy event at startup just to get cr4 enabled.

That's one work-around :-)

> > I still don't get that argument, 2 rdpmc's is cheaper than doing wrmsr,
> > not to mention doing wrmsr through a syscall. And looking at that mmap
> > page is 1 cacheline. Is that cacheline read (assuming you miss) the real
> > problem?
> Well at least by default the first read of the mmap page causes a 
> pagefault which adds a few thousand cycles of latency.  Though you can
> somewhat get around this by prefaulting it in at some point.

MAP_POPULATE is your friend there, but yes manually prefaulting is
perfectly fine too, and the HPC people are quite familiar with the
concept, they do it for a lot of things.

> Anyway I'm just reporting numbers I get when measuring the overhead of 
> the old perfctr interface vs perf_event on typical PAPI workloads.  It's 
> true you can re-arrange calls and such so that perf_event behaves better 
> but that involves redoing a lot of existing code.

OK agreed, having to change existing code is often subject to various
forms of inertia/resistance. And yes I cannot deny that some of the
features perf has come at the expense of various overheads, however hard
we're trying to keep costs down.

> I do appreciate the trouble you've gone through keeping self-monitoring 
> working considering the fact that I'm the only user admitting to using it.

I have some code somewhere that uses it too, I've tried pushing it off
to other people but so far there are no takers :-)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists