linux-kernel - Re: Problem with perf hardware counters grouping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.1109011308430.9976@cl320.eecs.utk.edu>
Date:	Thu, 1 Sep 2011 13:16:10 -0400
From:	Vince Weaver <vweaver1@...s.utk.edu>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	Mike Hommey <mh@...ndium.org>, <linux-kernel@...r.kernel.org>
Subject: Re: Problem with perf hardware counters grouping

On Thu, 1 Sep 2011, Peter Zijlstra wrote:
> > Is there any good workaround, or do we have to fall back to trying to 
> > start/read/stop every proposed event set to make sure it's valid?
> 
> I guess my first question is going to be, how do you know what the
> maximum number of counters is in the first place?

The use case where this comes up the easiest is where you are adding
events to an eventset one at a time until failure.  Then you assume
failure - 1 is the number available.  So this would boil down
to doing that many sys_perf_open()/close() calls.  This obviously fails
in the current watchdog timer case.

The other way to know is the query libpfm4 which "knows" the number of 
counters available on each CPU.  PAPI uses this as a hueristic not as a 
hard limit, but it can also lead to the problem occurring if the test 
tries the limit, the sys_perf_open() succcedes... and then fails upon 
read.

Does the perf tool work around this in some way?  

> > This is going to seriously impact performance, and perf_event performance 
> > is pretty bad to begin with.  The whole reason I was writing the tests to 
> > trigger this is because PAPI users are complaining that perf_event 
> > overhead is roughly twice that of perfctr or perfmon2, which I've verified 
> > experimentally.
> 
> Yeah, you keep saying this, where does it come from? Only the lack of
> userspace rdpmc?

that's part of it.  I've been working on isolating this, but for a fair 
comparison it involves writing low-level code that accesses perf_event, 
perfctr, and perfmon2 directly at the syscall level and as you can imagine 
that's not easy or fun.  It's also tricky as you can imagine to try to 
profile the perf_event code using perf_events.

Vince
vweaver1@...s.utk.edu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/