lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 20 Oct 2010 17:35:04 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	"Frank Ch. Eigler" <fche@...hat.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Paul Mackerras <paulus@...ba.org>,
	Stephane Eranian <eranian@...gle.com>,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Tom Zanussi <tzanussi@...il.com>,
	Masami Hiramatsu <mhiramat@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Robert Richter <robert.richter@....com>
Subject: Re: [RFC] perf: Dwarf cfi based user callchains

On Wed, Oct 13, 2010 at 11:13:27AM -0400, Frank Ch. Eigler wrote:
> 
> Frederic Weisbecker <fweisbec@...il.com> writes:
> 
> > [...]
> > This brings dwarf cfi based callchain for userspace apps that don't have
> > frame pointers.
> 
> Interesting approach!
> 
> > [...]
> > - it's slow. A first improvement to make it faster is to support binary
> >   search from .eh_frame_hdr. 
> 
> In systemtap land, we did find a dramatic improvement from this too.


Yeah, linear walking on .eh_frame is not a good thing other than for
testing and debugging.


 
> Have you measured the cost of transcribing of potentially large chunks
> of the user stacks?  We did not seriously evaluate this path, since we
> encounter megabyte+ stacks in larger userspace apps, and copying THAT
> out seemed absurd.


Actually that's quite affordable. And you don't need to dump all the
stack of a process on every samples, that would indeed be absurd.
A small chunk, starting from the stack pointer, is enough.

What is interesting is that you can play with several different
sizes of dump, the higher it is and the deeper you'll be able to
unwind, that also means more profiling overhead.

I can't measure significant throughput issues with hackbench
for example:


Normal run:

	$ time ./hackbench 10
	Time: 3.415

	real	0m3.506s
	user	0m0.257s
	sys	0m6.519s


With perf record (default is cycles counter with 1000 HZ samples frequency):

	$ time ./perf record ./hackbench 10
	Time: 3.584
	[ perf record: Woken up 1 times to write data ]
	[ perf record: Captured and wrote 0.381 MB perf.data (~16654 samples) ]

	real	0m3.748s
	user	0m0.028s
	sys	0m0.022s


With perf record + frame pointer based callchain capture

	$ time ./perf record -g fp ./hackbench 10
	Time: 3.666
	[ perf record: Woken up 3 times to write data ]
	[ perf record: Captured and wrote 1.426 MB perf.data (~62281 samples) ]

	real	0m3.834s
	user	0m0.033s
	sys	0m0.046s

With perf record + 4096 bytes of stack dump in every sample


	$ time ./perf record -g dwarf,4096 ./hackbench 10
	Time: 3.697
	[ perf record: Woken up 15 times to write data ]
	[ perf record: Captured and wrote 5.156 MB perf.data (~225251 samples) ]

	real	0m3.931s
	user	0m0.026s
	sys	0m0.135s

With perf record + 20000 bytes of stack dump in every sample

	$ time ./perf record -g dwarf,20000 ./hackbench 10
	Time: 3.847
	[ perf record: Woken up 9 times to write data ]
	[ perf record: Captured and wrote 13.349 MB perf.data (~583219 samples) ]

	real	0m4.559s
	user	0m0.028s
	sys	0m0.329s


So there is no big differences. May be hackbench is not a good example
to highlight the possible impact though. And some tuning with higher
frequencies would make the difference more visible.

Perhaps the real impact is more on the amount to record, the data file
tends to grow quickly, obviously.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ