linux-kernel - [RFC][PATCH 0/7] perf trace: general-purpose scripting support, v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1259133352-23685-1-git-send-email-tzanussi@gmail.com>
Date:	Wed, 25 Nov 2009 01:15:45 -0600
From:	Tom Zanussi <tzanussi@...il.com>
To:	linux-kernel@...r.kernel.org
Cc:	mingo@...e.hu, fweisbec@...il.com, rostedt@...dmis.org,
	anton@...ba.org, hch@...radead.org
Subject: [RFC][PATCH 0/7] perf trace: general-purpose scripting support, v2

Hi,

Here's a belated update to v1 of the trace stream scripting support
patches I posted last month; I had meant to get this out sooner but
got too busy with other things...

This patchset defines a generic interface for processing the binary
output of 'perf trace' and making it directly available to
user-defined scripts written in general-purpose scripting languages
such as Perl or Python.  It also builds a general-purpose Perl
scripting trace processor on top of the new interface and provides a
few example scripts that exercise the basic functionality.

The main motivation behind it is to provide a more efficient and
powerful alternative to the popular method of parsing the ascii trace
output in order to extract useful information from it.  To avoid the
overhead and complexity of all that, this patchset provides a
direct-to-script-interpreter pathway for doing the same thing, but in
a more regularized fashion, one that takes advantage of all the event
meta-info provided by the tracing infrustructure, such as the
event/field info contained in the 'format files' designed for that
purpose.

Anyone who still wants to deal with the trace stream as straight ascii
text can still have the scripts generate it (and in fact the interface
provides a way to have event printing code auto-generated.  Also, the
original ascii output mode is preserved when the scripting options
aren't used), but for anyone wanting to do something more interesting
with the data using a rapid prototyping language such as Perl, this
patchset makes it easy to do so.  It allows the full power of
general-purpose scripting languages to be applied to the trace stream
to do non-trivial analyses, and suddenly makes available huge
libraries of useful tools and modules (e.g. CPAN for Perl) to be
applied to the problem of creating new and interesting trace
applications.

This patchset only implements a Perl interface, but the intention is
to make it relatively easy to add support for other languages such as
Python, Ruby, etc, etc - all they need to do is follow the example of
the Perl implementation and provide their own implementations of the
trace_scripting_ops and supporting functions (more details below).

Changes from v1:

- uses the -M (multiplex) option of perf record instead of the sorting
  hack in v1 to order events.

- makes the command-line option for specifying scripts (-s) more
  flexible and scalable to the addition of other scripting languages.

- adds a Perl->C interface to provide scripts a way to access event
  data not included in event handlers; this, along with a related
  context param now passed to handlers, could also be used to access
  metadata in the parent perf executable.

- for usability, adds centralized documentation and some simple shell
  scripts to avoid having to remember long command lines.

- upgrades the licensing of the Perl modules to GPL

Installation:

For the Perl scripting support to work, you first need to install the
Perl development library, libperl (e.g. apt-get install liberl-dev in
ubuntu), then make and install perf.

Known problems:

None with the code itself, AFAIK, but I do see some strange results in
some scripts when using the perf timestamps.  For example, the
wakeup-latency.pl script, if run for a long enough time, will show
output like the following:

root@...picana:~# wakeup-latency-record
^C[ perf record: Woken up 76 times to write data ]
[ perf record: Captured and wrote 10.831 MB perf.data (~473227 samples) ]

root@...picana:~# wakeup-latency-report
perf trace started with Perl script /root/libexec/perf-core/scripts/perl/wakeup-

wakeup_latency stats:

total_wakeups: 458359
avg_wakeup_latency (ns): 27974
min_wakeup_latency (ns): 18446744073709546323
max_wakeup_latency (ns): 55234235

perf trace Perl script stopped

Obviously, something is wrong with the min_wakeup_latency stat.  To
dig a little further, I added some printing code, which I conveniently
took from the code generated by the new perf trace -g option:

root@...picana:~# perf trace -g perl
generated Perl script: perf-trace.pl

I copied the generated print() and print_header() code from
perf-trace.pl for the sched_switch() and sched_wakeup() handler
functions and added a line to print the $min_wakeup_latency value
whenever it changes (I've attached the diff from wakeup-latency.pl
below).

After doing that and re-running the wakeup-latency script, I got the
below output, which is the snippet surrounding a negative latency (see
min_wakeup_latency: -5293).

Looking at the two events preceding it, you can see that the
sched_switch has an earlier timestamp than the sched_wakeup event that
preceded it:

sched::sched_wakeup      1 01238.657172818     6186 firefox              comm=firefox, pid=7140, prio=120, success=1, target_cpu=1
sched::sched_switch      1 01238.657197514     6186 firefox              prev_comm=firefox, prev_pid=6186, prev_prio=120, prev_state=S, next_comm=firefox, next_pid=7140, next_prio=120
sched::sched_switch      0 01238.657881346     4555 kondemand/0          prev_comm=kondemand/0, prev_pid=4555, prev_prio=120, prev_state=S, next_comm=firefox, next_pid=6183, next_prio=120

sched::sched_wakeup      0 01238.657997033     6183 firefox              comm=firefox, pid=6199, prio=120, success=1, target_cpu=1
sched::sched_switch      1 01238.657991740     7140 firefox              prev_comm=firefox, prev_pid=7140, prev_prio=120, prev_state=S, next_comm=firefox, next_pid=6199, next_prio=120

min_wakeup_latency: -5293

sched::sched_wakeup      0 01238.658063038     6183 firefox              comm=Xorg, pid=5536, prio=140, success=1, target_cpu=0
sched::sched_wakeup      0 01238.658232707     6183 firefox              comm=npviewer.bin, pid=6998, prio=120, success=1, target_cpu=1
sched::sched_switch      1 01238.658311152     6199 firefox              prev_comm=firefox, prev_pid=6199, prev_prio=120, prev_state=S, next_comm=npviewer.bin, next_pid=6998, next_prio=120
sched::sched_switch      0 01238.658397358     6183 firefox              prev_comm=firefox, prev_pid=6183, prev_prio=120, prev_state=S, next_comm=Xorg, next_pid=5536, next_prio=140
sched::sched_wakeup      0 01238.658460734     5536 Xorg                 comm=firefox, pid=6183, prio=120, success=1, target_cpu=0
sched::sched_switch      1 01238.658311152     6998 npviewer.bin         prev_comm=npviewer.bin, prev_pid=6998, prev_prio=120, prev_state=S, next_comm=swapper, next_pid=0, next_prio=140
sched::sched_switch      0 01238.664043515     5536 Xorg                 prev_comm=Xorg, prev_pid=5536, prev_prio=140, prev_state=R, next_comm=firefox, next_pid=6183, next_prio=120
sched::sched_switch      0 01238.664059065     6183 firefox              prev_comm=firefox, prev_pid=6183, prev_prio=120, prev_state=S, next_comm=Xorg, next_pid=5536, next_prio=140
sched::sched_wakeup      0 01238.665301440     5536 Xorg                 comm=metacity, pid=5924, prio=140, success=1, target_cpu=1
sched::sched_switch      1 01238.665313262        0 init                 prev_comm=swapper, prev_pid=0, prev_prio=140, prev_state=R, next_comm=metacity, next_pid=5924, next_prio=140
sched::sched_wakeup      0 01238.665314341     5536 Xorg                 comm=gnome-settings-, pid=5923, prio=140, success=1, target_cpu=0
sched::sched_switch      0 01238.665320692     5536 Xorg                 prev_comm=Xorg, prev_pid=5536, prev_prio=140, prev_state=R, next_comm=gnome-settings-, next_pid=5923, next_prio=140

So the event ordering seems right, but the timestamps don't.  To get
some idea what might be happening, I added a couple of printks to
perf_event_output(), one at the beginning and one at the end and
recorded perf_clock() at those points (since the actual event
timestamping happens somewhere in between).

Whenever the negative latency happens in the script, the perf_clock()
values on the sched_switch cpu seem to get stuck for awhile during
that time.  Below is the output from cpu1 - note the timestamp of the
event (01238.657991740), which matches the timestamp at both the
beginning and end of perf_event_output().

The values for cpu0 around that time are also shown for completeness.

cpu1:

perf_event_output begin: ts 1238657171430 cpu 1
perf_event_output   end: ts 1238657173618 cpu 1
perf_event_output begin: ts 1238657196431 cpu 1
perf_event_output   end: ts 1238657198193 cpu 1

perf_event_output begin: ts 1238657991740 cpu 1
perf_event_output   end: ts 1238657991740 cpu 1

perf_event_output begin: ts 1238658311152 cpu 1
perf_event_output   end: ts 1238658311152 cpu 1
perf_event_output begin: ts 1238658311152 cpu 1
perf_event_output   end: ts 1238658311152 cpu 1
perf_event_output begin: ts 1238665312062 cpu 1
perf_event_output   end: ts 1238665314481 cpu 1
perf_event_output begin: ts 1238665414056 cpu 1

cpu0:

perf_event_output   end: ts 1238656044814 cpu 0
perf_event_output begin: ts 1238657879958 cpu 0
perf_event_output   end: ts 1238657882306 cpu 0
perf_event_output begin: ts 1238657995453 cpu 0
perf_event_output   end: ts 1238657997881 cpu 0
perf_event_output begin: ts 1238658061886 cpu 0
perf_event_output   end: ts 1238658064203 cpu 0
perf_event_output begin: ts 1238658231296 cpu 0
perf_event_output   end: ts 1238658233676 cpu 0
perf_event_output begin: ts 1238658395923 cpu 0

So anyway, it looks like what happens is that clock0 gets stuck at
time t0, event 1 happens on the normally advancing clock1 at time t1 >
t0, event 2 happens after event 1 but because t0 is stuck at t0, gets
time t0 < t1 and thus the negative result.

I haven't investigated any further at this point, except to put in a
few more printks that show it is using the tsc, and to note that you
only get out-of-order timestamps between cpus, though you do get
0-difference timestamps between events on the same cpu, which amounts
to the same thing - I'm hoping someone else might have a better idea
about what's happening and how to fix it, as it obviously limits the
value of the timetamps for analysis.

Other than that things seem to work pretty well.  The rest of this
e-mail provides a short introduction to help get you started writing
scripts; as always please see the code for details.  There's also
perf-trace-perl Documentation included with this patchset, which might
also be helpful.  All of the scripts described below can be found in
~/libexec/perf-core/scripts/perl after you install perf.

The first step in a normal trace session is to run 'perf record' with
a set of tracepoint events you're interested in e.g.:

root@...picana:~# perf record -c 1 -f -a -M -R -e kmem:kmalloc -e kmem:kfree  -e irq:softirq_entry -e irq:softirq_exit -e sched:sched_switch
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.485 MB perf.data (~21198 samples) ]

Next, you can use the -g perl option of perf trace to have it generate
a script with one handler function for each event in the trace:

root@...picana:~# perf trace -g perl

The name of the generated script will be 'perf-trace.pl':

root@...picana:~# perf trace -g perl
generated Perl script: perf-trace.pl

Here's a portion of the Perl code generated for this trace, to give
some idea of what a trace script looks like:

# perf trace event handlers, generated by perf trace -g perl
# Licensed under the terms of the GNU GPL License version 2

# The common_* event handler fields are the most useful fields common to
# all events.  They don't necessarily correspond to the 'common_*' fields
# in the format files.  Those fields not available as handler params can
# be retrieved using Perl functions of the form common_*($context).
# See Context.pm for the list of available functions.

use lib "$ENV{'PERF_EXEC_PATH'}/scripts/perl/Perf-Trace-Util/lib";
use lib "./Perf-Trace-Util/lib";
use Perf::Trace::Core;
use Perf::Trace::Context;
use Perf::Trace::Util;

sub trace_begin
{
	# optional
}

sub trace_end
{
	# optional
}

sub sched::sched_switch
{
	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
	    $common_pid, $common_comm,
	    $prev_comm, $prev_pid, $prev_prio, $prev_state, 
	    $next_comm, $next_pid, $next_prio) = @_;

	print_header($event_name, $common_cpu, $common_secs, $common_nsecs,
	             $common_pid, $common_comm);

	printf("prev_comm=%s, prev_pid=%d, prev_prio=%d, prev_state=%s, ".
	       "next_comm=%s, next_pid=%d, next_prio=%d\n",
	       $prev_comm, $prev_pid, $prev_prio, 
	       flag_str("sched::sched_switch", "prev_state", $prev_state), 
	       $next_comm, $next_pid, $next_prio);
}

sub irq::softirq_exit
{
	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
	    $common_pid, $common_comm,
	    $vec) = @_;

	print_header($event_name, $common_cpu, $common_secs, $common_nsecs,
	             $common_pid, $common_comm);

	printf("vec=%s\n",
	       symbol_str("irq::softirq_exit", "vec", $vec));
}

sub irq::softirq_entry
{
	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
	    $common_pid, $common_comm,
	    $vec) = @_;

	print_header($event_name, $common_cpu, $common_secs, $common_nsecs,
	             $common_pid, $common_comm);

	printf("vec=%s\n",
	       symbol_str("irq::softirq_entry", "vec", $vec));
}

sub kmem::kfree
{
	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
	    $common_pid, $common_comm,
	    $call_site, $ptr) = @_;

	print_header($event_name, $common_cpu, $common_secs, $common_nsecs,
	             $common_pid, $common_comm);

	printf("call_site=%u, ptr=%u\n",
	       $call_site, $ptr);
}

sub kmem::kmalloc
{
	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
	    $common_pid, $common_comm,
	    $call_site, $ptr, $bytes_req, $bytes_alloc, 
	    $gfp_flags) = @_;

	print_header($event_name, $common_cpu, $common_secs, $common_nsecs,
	             $common_pid, $common_comm);

	printf("call_site=%u, ptr=%u, bytes_req=%u, bytes_alloc=%u, ".
	       "gfp_flags=%s\n",
	       $call_site, $ptr, $bytes_req, $bytes_alloc, 
	       
	       flag_str("kmem::kmalloc", "gfp_flags", $gfp_flags));
}
.
.
.

For each event in the trace, the generated event handlers will print
the event name along with all the fields for the event.

To run the script, run perf trace with the -p option:

root@...picana:~# perf trace -s perf-trace.pl

The output should look something like this:

kmem::kmalloc            1 01041.022132636    11822 perf                 call_site=18446744071579892671, ptr=18446612134886444032, bytes_req=704, bytes_alloc=1024, gfp_flags=GFP_WAIT | GFP_IO | GFP_ZERO
kmem::kmalloc            1 01041.022157859    11822 perf                 call_site=18446744071579892671, ptr=18446612134886455296, bytes_req=704, bytes_alloc=1024, gfp_flags=GFP_WAIT | GFP_IO | GFP_ZERO
kmem::kmalloc            1 01041.022186965    11822 perf                 call_site=18446744071579892671, ptr=18446612134886447104, bytes_req=704, bytes_alloc=1024, gfp_flags=GFP_WAIT | GFP_IO | GFP_ZERO
kmem::kmalloc            1 01041.022219512    11822 perf                 call_site=18446744071579892671, ptr=18446612134886452224, bytes_req=704, bytes_alloc=1024, gfp_flags=GFP_WAIT | GFP_IO | GFP_ZERO
kmem::kmalloc            1 01041.022317930    11822 perf                 call_site=18446744072100923040, ptr=18446612135313073728, bytes_req=64, bytes_alloc=64, gfp_flags=GFP_WAIT | GFP_IO
sched::sched_switch      1 01041.022514835    11822 perf                 prev_comm=perf, prev_pid=11822, prev_prio=140, prev_state=D, next_comm=swapper, next_pid=0, next_prio=140
irq::softirq_entry       0 01041.024022805        0 swapper              vec=TIMER
irq::softirq_exit        0 01041.024029784        0 swapper              vec=TIMER
irq::softirq_entry       0 01041.024031978        0 swapper              vec=SCHED
irq::softirq_exit        0 01041.024035551        0 swapper              vec=SCHED
irq::softirq_entry       1 01041.024538754        0 init                 vec=TIMER
irq::softirq_exit        1 01041.024545467        0 init                 vec=TIMER
irq::softirq_entry       1 01041.025896765        0 init                 vec=BLOCK
irq::softirq_exit        1 01041.025944221        0 init                 vec=BLOCK
sched::sched_switch      1 01041.025958098        0 init                 prev_comm=swapper, prev_pid=0, prev_prio=140, prev_state=R, next_comm=perf, next_pid=11822, next_prio=140
kmem::kmalloc            1 01041.026372194    11822 perf                 call_site=18446744071580383712, ptr=18446612135006028448, bytes_req=32, bytes_alloc=32, gfp_flags=GFP_WAIT | GFP_IO
kmem::kmalloc            1 01041.026376141    11822 perf                 call_site=18446744071580382581, ptr=18446612134887800192, bytes_req=104, bytes_alloc=128, gfp_flags=GFP_WAIT | GFP_IO
kmem::kmalloc            1 01041.026400928    11822 perf                 call_site=18446744071580383440, ptr=18446612134887923712, bytes_req=4096, bytes_alloc=4096, gfp_flags=GFP_WAIT | GFP_IO
kmem::kfree              1 01041.026581992    11822 perf                 call_site=18446744071580379757, ptr=18446612134887923712
kmem::kfree              1 01041.026586236    11822 perf                 call_site=18446744071580379765, ptr=18446612134887800192
kmem::kfree              1 01041.026588755    11822 perf                 call_site=18446744071580379904, ptr=18446612135006028448
kmem::kmalloc            1 01041.026654934    11822 perf                 call_site=18446744071580626944, ptr=18446612135006028448, bytes_req=24, bytes_alloc=32, gfp_flags=GFP_WAIT | GFP_IO | GFP_ZERO
kmem::kmalloc            1 01041.026657946    11822 perf                 call_site=18446744071580382581, ptr=18446612134887800192, bytes_req=104, bytes_alloc=128, gfp_flags=GFP_WAIT | GFP_IO
kmem::kmalloc            1 01041.026672236    11822 perf                 call_site=18446744071580383440, ptr=18446612135312744448, bytes_req=4096, bytes_alloc=4096, gfp_flags=GFP_WAIT | GFP_IO
kmem::kfree              1 01041.026889195    11822 perf                 call_site=18446744071580379823, ptr=18446612135006028448
kmem::kfree              1 01041.026892212    11822 perf                 call_site=18446744071580379757, ptr=18446612135312744448
.
.
.

Notice the gfp_flags and vec fields in the previous trace snippet.
Not only does the generated script display the normal string and
numeric fields, but it also automatically figures out that certain
fields, though numeric, should be interpreted as flags or symbols
instead, and generates the code to do that (using the flag and
symbolic enumerations defined in the event's format file).

Running the autogenerated script and displaying the printed events is
only marginally useful; at the least it can be used to see what's
there and verify that things basically work.

One of the other things that might be marginally useful might be to
remove all the handlers from the generated script and replace it with
a single handler for the 'unhandled_trace' event which is available to
every script.  Here's a complete script for doing that:

# perf trace event handlers, generated by perf trace -g perl
# Licensed under the terms of the GNU GPL License version 2

# The common_* event handler fields are the most useful fields common to
# all events.  They don't necessarily correspond to the 'common_*' fields
# in the format files.  Those fields not available as handler params can
# be retrieved using Perl functions of the form common_*($context).
# See Context.pm for the list of available functions.

use lib "$ENV{'PERF_EXEC_PATH'}/scripts/perl/Perf-Trace-Util/lib";
use lib "./Perf-Trace-Util/lib";
use Perf::Trace::Core;
use Perf::Trace::Context;
use Perf::Trace::Util;

my %unhandled;

sub trace_end
{
    print_unhandled();
}

sub trace_unhandled
{
	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
	    $common_pid, $common_comm) = @_;

	$unhandled{$event_name}++;
}

sub print_unhandled
{
    if ((scalar keys %unhandled) == 0) {
        return;
    }

    print "\nunhandled events:\n\n";

    printf("%-40s  %10s\n", "event", "count");
    printf("%-40s  %10s\n", "----------------------------------------",
           "-----------");

    foreach my $event_name (keys %unhandled) {
        printf("%-40s  %10d\n", $event_name, $unhandled{$event_name});
    }
}

For every unhandled event (every event for this script, since we
removed all the real handlers), the total counts for each are hashed
by event name and printed out at the end of the trace (in the
'trace_end' handler, which can also be defined for each script.
Here's some example output for this script:

root@...picana:~# perf trace -s perf-trace.pl
perf trace started with Perl script perf-trace.pl

unhandled events:

event                                          count
----------------------------------------  -----------
kmem::kfree                                      973
irq::softirq_entry                               458
kmem::kmalloc                                    971
sched::sched_switch                              511
irq::softirq_exit                                458

perf trace Perl script stopped

The main point of being able to script the trace stream, however, is
to use the power of the scripting language to do more useful analysis.
One simple example would be to display the total r/w activity for all
processes on the system:

root@...picana:~# perf record -c 1 -f -a -M -R -e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write
^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.748 MB perf.data (~32694 samples) ]

root@...picana:~# perf trace -s ~/libexec/perf-core/scripts/perl/rw-by-pid.pl
perf trace started with Perl script /root/libexec/perf-core/scripts/perl/rw-by-pid.pl

read counts by pid:

   pid                  comm     # reads  bytes_requested  bytes_read
------  --------------------  -----------  ----------  ----------
 12029  perf                        1087     1106144      917550
  5537  Xorg                          83        1992        1992
 11296  wterm                          7       28672          64
  5973  gnome-screensav                8       32768          64
  5926  gnome-settings-                3       12288          32

failed reads by pid:

   pid                  comm  error #    # errors
------  --------------------  ------  ----------
 12029  perf                       0         111
 11296  wterm                    -11           5
  5926  gnome-settings-          -11           2
  5973  gnome-screensav          -11           6

write counts by pid:

   pid                  comm    # writes  bytes_written
------  --------------------  -----------  ----------
 12029  perf                        2187      148952
 11296  wterm                          1           1

failed writes by pid:

   pid                  comm  error #    # errors
------  --------------------  ------  ----------

perf trace Perl script stopped

The above output shows sorted tables of reads and write activity, and
also tracks counts of each different read/write return error seen by
each process.

We can use the results of a higher-level trace analysis like the one
above to create other scripts that drill down to get more detailed
output.  For example, the above results show that perf itself was
responsible for a lot of reading and writing.  To see more detail
about which files it's reading from/writing to, we can write and run
another script:

root@...picana:~# perf record -c 1 -f -a -M -R -e syscalls:sys_enter_read -e syscalls:sys_enter_write
^C[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 33.299 MB perf.data (~1454834 samples) ]

root@...picana:~# perf trace -s ~/libexec/perf-core/scripts/perl/rw-by-file.pl
perf trace started with Perl script /root/libexec/perf-core/scripts/perl/rw-by-file.pl

file read counts for perf:

    fd     # reads  bytes_requested
------  ----------  -----------
     9        1710     1751040
     6           1          32
     7           1          32
     5           1          32

file write counts for perf:

    fd    # writes  bytes_written
------  ----------  -----------
     3      391246    34916416

perf trace Perl script stopped

>From the above, we see that most of the reads are coming from fd 9 and
most of the writes are going to fd 3.  At this point, we don't have
any further information about what those files actually are - to do
that we'd need to be able to get it from either the open() system call
tracepoint (preferably, but the filename isn't yet available from that
source) or from reading /proc during the trace.

Actually, we probably don't really event want to see the events from
perf itself, so should also have a way to filter out anything from
perf in the kernel.

Another simple example could be to use a script to track wakeup
latencies:

root@...picana:~# perf record -c 1 -f -a -M -R -e sched:sched_switch -e sched:sched_wakeup
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.365 MB perf.data (~15952 samples) ]

root@...picana:~# perf trace -s ~/libexec/perf-core/scripts/perl/wakeup-latency.pl
perf trace started with Perl script /root/libexec/perf-core/scripts/perl/wakeup-latency.

wakeup_latency stats:

total_wakeups: 430
avg_wakeup_latency (ns): 12807
min_wakeup_latency (ns): 2745
max_wakeup_latency (ns): 154435

perf trace Perl script stopped

This script tracks the time differences between sched_wakeup and
sched_switch events and prints out the totals when the script is done.

One of the nice things about using trace data to do this kind of
analysis is that the individual trace records can be displayed at any
time, so for instance if one was interested in seeing exactly which
events caused the min an max latencies displayed in the output, the
individual events could be viewed by simply uncommenting the
auto-generated print statements.

Finally, here's another example that displays workqueue stats for a
trace run:

root@...picana:~# perf record -c 1 -f -a -M -R -e workqueue:workqueue_creation -e workqueue:workqueue_destruction -e workqueue:workqueue_execution -e workqueue:workqueue_insertion
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.285 MB perf.data (~12470 samples) ]

root@...picana:~# perf trace -s ~/libexec/perf-core/scripts/perl/workqueue-stats.pl
perf trace started with Perl script /root/libexec/perf-core/scripts/perl/workqueue-stats

workqueue work stats:

cpu    ins   exec       name                
---    ---   ----       ----                
  0      4      4       kblockd/0           
  0     62     62       ata/0               
  0     90     91       events/0            
  0     58     58       kondemand/0         
  1     36     36       ata/1               
  1      1      0       events/0            
  1     42     42       events/1            
  1     18     18       kblockd/1           

workqueue lifecycle stats:

cpu created destroyed   name                
--- ------- ---------   ----                

perf trace Perl script stopped

There's also a check-perf-trace.pl script in the
libexec/perf-core/scripts/perl directory that doesn't do anything
interesting but contains code that's meant to exercise all of the
functionality available for the Perl scripting support.  Looking at
that, along with the other example scripts in that directory should
provide a basic starting point for writing scripts of your own.

Note also that the final patch adds a 'bin' directory containing some
shell scripts so you can run the above scripts without typing in the
full command lines.  So, with libexec/perf-core/scripts/perl in your
path, you could for example run the perf record/report commands for
the workqueue-stats script like this:

root@...picana:~# workqueue-stats-record
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.238 MB perf.data (~10389 samples) ]

root@...picana:~# workqueue-stats-report
perf trace started with Perl script /root/libexec/perf-core/scripts/perl/workqueue-stats

workqueue work stats:

cpu    ins   exec       name                
---    ---   ----       ----                
  0      2      2       kblockd/0           
  0     14     14       ata/0               
  0      2      2       events/0            
  0      5      5       kondemand/0         
  1      2      2       events/1            

workqueue lifecycle stats:

cpu created destroyed   name                
--- ------- ---------   ----                

perf trace Perl script stopped


Thanks,

Tom

timestamp debugging code:

--- libexec/perf-core/scripts/perl/wakeup-latency.pl~	2009-11-23 19:37:53.000000000 -0600
+++ libexec/perf-core/scripts/perl/wakeup-latency.pl	2009-11-23 19:45:56.000000000 -0600
@@ -32,6 +32,15 @@
 	$prev_comm, $prev_pid, $prev_prio, $prev_state, $next_comm, $next_pid,
 	$next_prio) = @_;
 
+	print_header($event_name, $common_cpu, $common_secs, $common_nsecs,
+	             $common_pid, $common_comm);
+
+	printf("prev_comm=%s, prev_pid=%d, prev_prio=%d, prev_state=%s, ".
+	       "next_comm=%s, next_pid=%d, next_prio=%d\n",
+	       $prev_comm, $prev_pid, $prev_prio, 
+	       flag_str("sched::sched_switch", "prev_state", $prev_state), 
+	       $next_comm, $next_pid, $next_prio);
+
     my $wakeup_ts = $last_wakeup{$common_cpu}{ts};
     if ($wakeup_ts) {
 	my $switch_ts = nsecs($common_secs, $common_nsecs);
@@ -41,6 +50,7 @@
 	}
 	if ($wakeup_latency < $min_wakeup_latency) {
 	    $min_wakeup_latency = $wakeup_latency;
+            print("min_wakeup_latency: $min_wakeup_latency\n");
 	}
 	$total_wakeup_latency += $wakeup_latency;
 	$total_wakeups++;
@@ -54,6 +64,14 @@
 	$common_pid, $common_comm,
 	$comm, $pid, $prio, $success, $target_cpu) = @_;
 
+	print_header($event_name, $common_cpu, $common_secs, $common_nsecs,
+	             $common_pid, $common_comm);
+
+	printf("comm=%s, pid=%d, prio=%d, success=%d, ".
+	       "target_cpu=%d\n",
+	       $comm, $pid, $prio, $success, 
+	       $target_cpu);
+
     $last_wakeup{$target_cpu}{ts} = nsecs($common_secs, $common_nsecs);
 }
 
@@ -101,3 +119,11 @@
 
     $unhandled{$event_name}++;
 }
+
+sub print_header
+{
+	my ($event_name, $cpu, $secs, $nsecs, $pid, $comm) = @_;
+
+	printf("%-20s %5u %05u.%09u %8u %-20s ",
+	       $event_name, $cpu, $secs, $nsecs, $pid, $comm);
+}

Tom Zanussi (7):
  perf trace: Add scripting ops
  perf trace: Add flag/symbolic format_flags
  perf trace: Add Perl scripting support
  perf trace: Add perf trace scripting support modules for Perl
  perf trace: Add interface to access perf data from Perl handlers
  perf trace: Add Documentation for perf trace Perl support
  perf trace: Add a scripts/perl/bin for perf trace shell scripts

 tools/perf/Documentation/perf-trace-perl.txt       |  219 +++++++
 tools/perf/Documentation/perf-trace.txt            |   11 +-
 tools/perf/Makefile                                |   26 +-
 tools/perf/builtin-trace.c                         |  250 ++++++++-
 tools/perf/scripts/perl/Perf-Trace-Util/Context.c  |  134 +++++
 tools/perf/scripts/perl/Perf-Trace-Util/Context.xs |   41 ++
 .../perf/scripts/perl/Perf-Trace-Util/Makefile.PL  |   17 +
 tools/perf/scripts/perl/Perf-Trace-Util/README     |   59 ++
 .../perl/Perf-Trace-Util/lib/Perf/Trace/Context.pm |   55 ++
 .../perl/Perf-Trace-Util/lib/Perf/Trace/Core.pm    |  192 +++++++
 .../perl/Perf-Trace-Util/lib/Perf/Trace/Util.pm    |   88 +++
 tools/perf/scripts/perl/Perf-Trace-Util/typemap    |    1 +
 .../perf/scripts/perl/bin/check-perf-trace-record  |    7 +
 .../perf/scripts/perl/bin/check-perf-trace-report  |    5 +
 tools/perf/scripts/perl/bin/rw-by-file-record      |    2 +
 tools/perf/scripts/perl/bin/rw-by-file-report      |    5 +
 tools/perf/scripts/perl/bin/rw-by-pid-record       |    2 +
 tools/perf/scripts/perl/bin/rw-by-pid-report       |    5 +
 tools/perf/scripts/perl/bin/wakeup-latency-record  |    6 +
 tools/perf/scripts/perl/bin/wakeup-latency-report  |    5 +
 tools/perf/scripts/perl/bin/workqueue-stats-record |    2 +
 tools/perf/scripts/perl/bin/workqueue-stats-report |    6 +
 tools/perf/scripts/perl/check-perf-trace.pl        |  106 ++++
 tools/perf/scripts/perl/rw-by-file.pl              |  105 ++++
 tools/perf/scripts/perl/rw-by-pid.pl               |  170 ++++++
 tools/perf/scripts/perl/wakeup-latency.pl          |  103 ++++
 tools/perf/scripts/perl/workqueue-stats.pl         |  129 +++++
 tools/perf/util/trace-event-parse.c                |   41 ++-
 tools/perf/util/trace-event-perl.c                 |  596 ++++++++++++++++++++
 tools/perf/util/trace-event-perl.h                 |   51 ++
 tools/perf/util/trace-event.h                      |   23 +
 31 files changed, 2450 insertions(+), 12 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-trace-perl.txt
 create mode 100644 tools/perf/scripts/perl/Perf-Trace-Util/Context.c
 create mode 100644 tools/perf/scripts/perl/Perf-Trace-Util/Context.xs
 create mode 100644 tools/perf/scripts/perl/Perf-Trace-Util/Makefile.PL
 create mode 100644 tools/perf/scripts/perl/Perf-Trace-Util/README
 create mode 100644 tools/perf/scripts/perl/Perf-Trace-Util/lib/Perf/Trace/Context.pm
 create mode 100644 tools/perf/scripts/perl/Perf-Trace-Util/lib/Perf/Trace/Core.pm
 create mode 100644 tools/perf/scripts/perl/Perf-Trace-Util/lib/Perf/Trace/Util.pm
 create mode 100644 tools/perf/scripts/perl/Perf-Trace-Util/typemap
 create mode 100644 tools/perf/scripts/perl/bin/check-perf-trace-record
 create mode 100644 tools/perf/scripts/perl/bin/check-perf-trace-report
 create mode 100644 tools/perf/scripts/perl/bin/rw-by-file-record
 create mode 100644 tools/perf/scripts/perl/bin/rw-by-file-report
 create mode 100644 tools/perf/scripts/perl/bin/rw-by-pid-record
 create mode 100644 tools/perf/scripts/perl/bin/rw-by-pid-report
 create mode 100644 tools/perf/scripts/perl/bin/wakeup-latency-record
 create mode 100644 tools/perf/scripts/perl/bin/wakeup-latency-report
 create mode 100644 tools/perf/scripts/perl/bin/workqueue-stats-record
 create mode 100644 tools/perf/scripts/perl/bin/workqueue-stats-report
 create mode 100644 tools/perf/scripts/perl/check-perf-trace.pl
 create mode 100644 tools/perf/scripts/perl/rw-by-file.pl
 create mode 100644 tools/perf/scripts/perl/rw-by-pid.pl
 create mode 100644 tools/perf/scripts/perl/wakeup-latency.pl
 create mode 100644 tools/perf/scripts/perl/workqueue-stats.pl
 create mode 100644 tools/perf/util/trace-event-perl.c
 create mode 100644 tools/perf/util/trace-event-perl.h

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/