linux-kernel - Re: [perfmon] Re: [perfmon2] perfmon2 merge news

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20071115085335.GB8603@frankl.hpl.hp.com>
Date:	Thu, 15 Nov 2007 00:53:35 -0800
From:	Stephane Eranian <eranian@....hp.com>
To:	dean gaudet <dean@...tic.org>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Christoph Hellwig <hch@...radead.org>,
	Paul Mackerras <paulus@...ba.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Greg KH <gregkh@...e.de>, Philip Mucci <mucci@...utk.edu>,
	William Cohen <wcohen@...hat.com>,
	Robert Richter <robert.richter@....com>,
	linux-kernel@...r.kernel.org, Stephane Eranian <eranian@....hp.com>
Subject: Re: [perfmon] Re: [perfmon2] perfmon2 merge news

Hello,

On Wed, Nov 14, 2007 at 08:20:22PM -0800, dean gaudet wrote:
> On Wed, 14 Nov 2007, Andi Kleen wrote:
> 
> > Later a syscall might be needed with event multiplexing, but that seems
> > more like a far away non essential feature.
> 
> actually multiplexing is the main feature i am in need of. there are an 
> insufficient number of counters (even on k8 with 4 counters) to do 
> complete stall accounting or to get a general overview of L1d/L1i/L2 cache 
> hit rates, average miss latency, time spent in various stalls, and the 
> memory system utilization (or HT bus utilization).  this runs out to 
> something like 30 events which are interesting... and re-running a 
> benchmark over and over just to get around the lack of multiplexing is a 
> royal pain in the ass.
> 
> it's not a "far away non-essential feature" to me.  it's something i would 
> use daily if i had all the pieces together now (and i'm constrained 
> because i cannot add an out-of-tree patch which adds unofficial syscalls 
> to the kernel i use).
> 

Multiplexing in the context of perfmon2 means that you can measure more events
than there are counters. To make this work, we create the notion of an event set
or more precisely a register set. Each set encapsulates the full PMU state. Then
the kernel multiplexes the sets onto the actual PMU hardware.

Why do we need this?

As Dean pointed out, that are many important metrics which do require more events
than there are counters. Making multiple runs can be difficult with some workloads.

But there are also other, less known, reasons why you'd want to do this. This is
not because you have lots of counters that you can necessarily measure lots of
related events simultaneously. Take pentium 4 for instance, it has 18 counters, but
for most interesting metrics, you cannot measure all the events at once. Why? Because
there are important hardware constraints which translate into event combination 
constraints. It is not uncommon to have constraints such as:
	- event A and B cannot be measured together
	- event A can only be measured by counter X
	- if event A is measured, then only events B, C, D can be measured

This is not just on Itanium. Power has limitations, Intel Core 2 has limitations,
AMD Opterons also have limitations.

When you combine limited number of counters with strong constraints, it can quickly
become difficult to make measurements in one run.

Multiplexing is, of course, not as good as measuring all events continuously but
if you run for long enough and with a reasonable switching periods, the *estimates*
you get by scaling the obtained counts can be very close to what they would have
been had you measured all events all the time. You have to balance precision with
overhead.

Why do this in the kernel?

One might argue that there is nothing preventing tools from multiplexing at the user
level. That's true and we do support this as well. You have to:
		- stop monitoring
		- read out current counter
		- reprogram config and data registers
		- restart monitoring

But there are some important benefits for doing this in the kernel especially for
per-thread monitoring. When you are not self-monitoring, you would need to stop the
other thread first, then issue a minimum of 4 system calls and incur a couple of
context switches. By doing it in the kernel, you guaranteed that switching always occur
in the context of the monitored thread.

Furthermore it can be integrated with kernel-level sampling. Adding the notion
of event set is fairly pervasive and you need to make sure that it fits well with
the other parts of the interface.

-- 
-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/