[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060915150444.GA11834@Krystal>
Date: Fri, 15 Sep 2006 11:04:44 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Ingo Molnar <mingo@...e.hu>
Cc: Michel Dagenais <michel.dagenais@...ymtl.ca>,
Roman Zippel <zippel@...ux-m68k.org>,
linux-kernel@...r.kernel.org,
Christoph Hellwig <hch@...radead.org>,
Andrew Morton <akpm@...l.org>, Ingo Molnar <mingo@...hat.com>,
Greg Kroah-Hartman <gregkh@...e.de>,
Thomas Gleixner <tglx@...utronix.de>,
Tom Zanussi <zanussi@...ibm.com>, ltt-dev@...fik.org
Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108
* Ingo Molnar (mingo@...e.hu) wrote:
>
> * Michel Dagenais <michel.dagenais@...ymtl.ca> wrote:
>
> > This is the crucial point. Using an INT3 at each dynamic tracepoint is
> > both costly and is a larger perturbation on the system under study.
> > [...]
>
> have you measured this?
>
Hi Ingo,
A very quick test (yes, done in user space, but should be accurate enough for
our needs) on a pentium 4 3 GHz shows that generating a int3 breakpoint in a
loop (connected to an empty handler) takes an average of 2.01µs per breakpoint.
LTT has an impact of about 0.220µs per probe (10 times smaller).
Please refer to this kind of high event rate workload :
http://www.listserv.shafik.org/pipermail/ltt-dev/2005-December/001139.html
On the same pentium 4, 3 GHz (in the following results, I do not consider the
fact that the CPU had hyperthreading enabled) :
Probe execution time at probe site : 220ns/event
220ns * 9588836 events = 2.11s
Event rate : 749994 events per second
LTT :
749994 events/s * 0.220µs/event = 16.5 % of cpu time
With a breakpoint :
749994 events/s * 2.01µs/event = 150 % of cpu time
Considering the limitations of these tests :
- int3 timings taken from user space, which implies calling an empty handler in
user space.
- The machine had hyperthreading enabled, but considered UP here.
It shows that tracing the same workload with breakpoints would make the machine
more than twice slower when a direct memory write has a relatively small impact
(16.5% of cpu time spent in probes).
In high event rate/low perturbation scenarios where instrumentation is put at
arbitrary locations in the code, it shows necessary to use the static
instrumentation alternative because the breakpoint approach is just too slow.
Mathieu
OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists