lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060915150444.GA11834@Krystal>
Date:	Fri, 15 Sep 2006 11:04:44 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Michel Dagenais <michel.dagenais@...ymtl.ca>,
	Roman Zippel <zippel@...ux-m68k.org>,
	linux-kernel@...r.kernel.org,
	Christoph Hellwig <hch@...radead.org>,
	Andrew Morton <akpm@...l.org>, Ingo Molnar <mingo@...hat.com>,
	Greg Kroah-Hartman <gregkh@...e.de>,
	Thomas Gleixner <tglx@...utronix.de>,
	Tom Zanussi <zanussi@...ibm.com>, ltt-dev@...fik.org
Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar (mingo@...e.hu) wrote:
> 
> * Michel Dagenais <michel.dagenais@...ymtl.ca> wrote:
> 
> > This is the crucial point. Using an INT3 at each dynamic tracepoint is 
> > both costly and is a larger perturbation on the system under study. 
> > [...]
> 
> have you measured this?
> 

Hi Ingo,

A very quick test (yes, done in user space, but should be accurate enough for
our needs) on a pentium 4 3 GHz shows that generating a int3 breakpoint in a
loop (connected to an empty handler) takes an average of 2.01µs per breakpoint.

LTT has an impact of about 0.220µs per probe (10 times smaller).

Please refer to this kind of high event rate workload :
http://www.listserv.shafik.org/pipermail/ltt-dev/2005-December/001139.html

On the same pentium 4, 3 GHz (in the following results, I do not consider the
fact that the CPU had hyperthreading enabled) :

Probe execution time at probe site : 220ns/event

220ns * 9588836 events = 2.11s

Event rate : 749994 events per second

LTT :
749994 events/s * 0.220µs/event = 16.5 % of cpu time

With a breakpoint :
749994 events/s * 2.01µs/event = 150 % of cpu time


Considering the limitations of these tests :
- int3 timings taken from user space, which implies calling an empty handler in
  user space.
- The machine had hyperthreading enabled, but considered UP here.

It shows that tracing the same workload with breakpoints would make the machine
more than twice slower when a direct memory write has a relatively small impact
(16.5% of cpu time spent in probes).

In high event rate/low perturbation scenarios where instrumentation is put at
arbitrary locations in the code, it shows necessary to use the static
instrumentation alternative because the breakpoint approach is just too slow.

Mathieu


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ