linux-kernel - [numbers] perfmon/pfmon overhead of 17%-94%

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090627064404.GA19368@elte.hu>
Date:	Sat, 27 Jun 2009 08:44:04 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Vince Weaver <vince@...ter.net>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Paul Mackerras <paulus@...ba.org>,
	linux-kernel@...r.kernel.org, Mike Galbraith <efault@....de>
Subject: [numbers] perfmon/pfmon overhead of 17%-94%


* Ingo Molnar <mingo@...e.hu> wrote:

> Besides, you compare perfcounters to perfmon (which you seem to be 
> a contributor of), while in reality perfmon has much, much worse 
> (and unfixable, because designed-in) measurement overhead.
> 
> So why are you criticising perfcounters for a 5000 cycles 
> measurement overhead while perfmon has huge, _hundreds of 
> millions_ of cycles measurement overhead (per second) for various 
> realistic workloads? [ In fact in one of the scheduler-tests 
> perfmon has a whopping measurement overhead of _nine billion_ 
> cycles, it increased total runtime of the workload from 3.3 
> seconds to 6.6 seconds. (!) ]

Here are the more detailed perfmon/pfmon measurement overhead 
numbers.

Test system is a "Intel Core2 E6800 @ 2.93GHz", 1 GB of RAM, default 
Fedora install.

I've measured two workloads:

    hackbench.c         # messaging server benchmark
    test-1m-pipes.c     # does 1 million pipe ops, similar to lat_pipe

v2.6.28+perfmon patches (v3, full):

    ./hackbench 10
    0.496400985  seconds time elapsed   ( +-   1.699% )

    pfmon --follow-fork--aggregate-results ./hackbench 10
    0.580812999  seconds time elapsed   ( +-   2.233% )

I.e. this workload runs 17% slower under pfmon, the measurement 
overhead is about 1.45 billion cycles.
 
Furthermore, when running a 'pipe latency benchmark', an app that 
does one million pipe reads and writes between two tasks (source 
code attached below), i measured the following perfmon/pfmon 
overhead:

    ./pipe-test-1m
    3.344280347  seconds time elapsed   ( +-   0.361% )

    pfmon --follow-fork --aggregate-results ./pipe-test-1m
    6.508737983  seconds time elapsed   ( +-   0.243% )

That's an about 94% measurement overhead, or about 9.2 _billion_ 
cycles overhead on this test-system.

These perfmon/pfmon overhead figures are consistently reproducible, 
and they happen on other test-systems as well, and with other 
workloads as well. Basically for any app that involves task creation 
or context-switching, perfmon adds considerable runtime overhead - 
well beyond the overhead of perfcounters.

	Ingo

-----------------{ pipe-test-1m.c }-------------------->

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/wait.h>
#include <linux/unistd.h>

#define LOOPS 1000000

int main (void)
{
	unsigned long long t0, t1;
	int pipe_1[2], pipe_2[2];
	int m = 0, i;

	pipe(pipe_1);
	pipe(pipe_2);

	if (!fork()) {
		for (i = 0; i < LOOPS; i++) {
			read(pipe_1[0], &m, sizeof(int));
			write(pipe_2[1], &m, sizeof(int));
		}
	} else {
		for (i = 0; i < LOOPS; i++) {
			write(pipe_1[1], &m, sizeof(int));
			read(pipe_2[0], &m, sizeof(int));
		}
	}

	return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/