[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110215044425.GA9994@elte.hu>
Date: Tue, 15 Feb 2011 05:44:25 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Frederic Weisbecker <fweisbec@...il.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Li Zefan <lizf@...fujitsu.com>,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
Tom Zanussi <tzanussi@...il.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust
filtering
hi Steve,
* Steven Rostedt <rostedt@...dmis.org> wrote:
> Steven Rostedt (14):
> tracing/filter: Have no filter return a match
> tracing/filter: Move OR and AND logic out of fn() method
> tracing/filter: Dynamically allocate preds
> tracing/filter: Call synchronize_sched() just once for system filters
> tracing/filter: Allocate the preds in an array
> tracing/filter: Free pred array on disabling of filter
> tracing/filter: Use a tree instead of stack for filter_match_preds()
> tracing/filter: Optimize short ciruit check
> tracing/filter: Check the created pred tree
> tracing/filter: Optimize filter by folding the tree
> tracing/filter: Move MAX_FILTER_PRED to local tracing directory
> tracing/filter: Increase the max preds to 2^14
> tracing/filter: Swap entire filter of events
> tracing/filter: Remove synchronize_sched() from __alloc_preds()
I finally got around testing the various tracing filter features we have.
Here is what i've done. Firstly, i have put my 'naive but curious user trying to
make use of filters' hat on.
I did:
perf list | grep Tracepoint | less
to get a list of tracepoints.
#
# Btw., unrelated feature request, it would be nice if the following shorcut did the
# obvious thing:
#
# perf list Tracepoint
#
I picked one of the interesting looking tracepoints:
syscalls:sys_enter_close [Tracepoint event]
first roadblock:
I had absolutely no idea how to proceed from here any further. I knew it from 'perf
list --help' that I could stick 'syscalls:sys_enter_close' into -e expressions, but
i had no idea how to utilize filter expressions at all.
I could stick it into perf stat though, and it gave me:
aldebaran:~> perf stat -a -e syscalls:sys_enter_close sleep 1
Performance counter stats for 'sleep 1':
12,240 syscalls:sys_enter_close
1.004366276 seconds time elapsed
Which is fine enough and just about right. Trying something more complex like
'firefox' seems to mis-count the number of close() syscalls though, when compared
to 'strace -c':
aldebaran:~> strace -f -q -c firefox 2>&1 | head
Error: cannot open display: :0
% time seconds usecs/call calls errors syscall
...
32.35 0.000011 0 271 4 close
...
aldebaran:~> perf stat -e syscalls:sys_enter_close firefox
Error: cannot open display: :0
Performance counter stats for 'firefox':
146 syscalls:sys_enter_close
0.048227674 seconds time elapsed
Note the 271 calls by strace, versus 146 calls by perf. One of them must be wrong -
or they are counting different things.
Using the syscalls:sys_enter_close event to filter i had to put my kernel hacker hat
on and do:
aldebaran:~> cat /debug/tracing/events/syscalls/sys_enter_close/format
name: sys_enter_close
ID: 404
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int common_lock_depth; offset:8; size:4; signed:1;
field:int nr; offset:12; size:4; signed:1;
field:unsigned int fd; offset:16; size:8; signed:0;
print fmt: "fd: 0x%08lx", ((unsigned long)(REC->fd))
And putting my kernel tracing hacker hat on i knew that the only interesting piece
of information for a filter would be the 'fd' word.
Putting my naive user hat back on and knowing the 'fd' trick, i tried to filter out
all close(10) calls from a 'hackbench' run. I knew that they existed:
aldebaran:~> strace -f ./hackbench 1 2>&1 | grep 'close(10)'
[pid 15277] close(10) = 0
So i first tried to use them in 'perf stat':
aldebaran:~> perf stat -e syscalls:sys_enter_close --filter 'fd == 10' ./hackbench 1
Error: unknown option `filter'
Ouch, no go - how come this does not work for perf stat? It makes quite a bit of
sense to use it there as well.
Anyway, knowing that it might work for 'perf record', i tried it there:
aldebaran:~> perf record -e syscalls:sys_enter_close --filter 'fd == 10' ./hackbench 1
Time: 0.079
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB perf.data (~325 samples) ]
Cool i thought, it has 325 samples, success!
Lets look at it via 'perf report' i thought:
aldebaran:~> perf report
aldebaran:~>
ouch. The TUI flickered something which went away immediately (feature request:
don't do that - at least try to mumble something about non-existent data or so).
As a naive user i had no idea whether this was a perf report bug or whether i did
something wrong.
As a kernel hacker i knew that this meant that the events in the perf.data werent
enough to display anything. As a kernel tracing hacker i knew i had to look at 'perf
report -D' output:
aldebaran:~> perf report -D | tail -15
TOTAL events: 107
MMAP events: 59
LOST events: 0
COMM events: 2
EXIT events: 3
THROTTLE events: 0
UNTHROTTLE events: 0
FORK events: 40
READ events: 0
SAMPLE events: 0
ATTR events: 0
EVENT_TYPE events: 0
TRACING_DATA events: 0
BUILD_ID events: 0
FINISHED_ROUND events: 3
ouch - no SAMPLE or TRACING_DATA events at all!
Lets try it without the filter:
aldebaran:~> perf record -e syscalls:sys_enter_close ./hackbench 1
Time: 0.043
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.010 MB perf.data (~448 samples) ]
aldebaran:~> perf report -D | tail -15
TOTAL events: 148
MMAP events: 59
LOST events: 0
COMM events: 2
EXIT events: 3
THROTTLE events: 0
UNTHROTTLE events: 0
FORK events: 40
READ events: 0
SAMPLE events: 41
ATTR events: 0
EVENT_TYPE events: 0
TRACING_DATA events: 0
BUILD_ID events: 0
FINISHED_ROUND events: 3
Got 41 SAMPLE events there.
So the --filter stuff does not seem to work at all for syscall events.
I tried my luck with another event: irq:irq_handler_entry. This seemed to work better:
aldebaran:~> perf record -a -e irq:irq_handler_entry sleep 1; perf report -D | tail -15[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.335 MB perf.data (~14630 samples) ]
TOTAL events: 4635
MMAP events: 2689
LOST events: 0
COMM events: 548
EXIT events: 64
THROTTLE events: 0
UNTHROTTLE events: 0
FORK events: 31
READ events: 0
SAMPLE events: 1299
ATTR events: 0
EVENT_TYPE events: 0
TRACING_DATA events: 0
BUILD_ID events: 0
FINISHED_ROUND events: 4
and produced a real trace:
swapper-0 [000] 49985.641036: irq_handler_entry: irq=16 name=uhci_hcd:usb3
swapper-0 [000] 49985.641038: irq_handler_entry: irq=16 name=eth2
distcc-20461 [000] 49985.641088: irq_handler_entry: irq=16 name=uhci_hcd:usb3
distcc-20461 [000] 49985.641090: irq_handler_entry: irq=16 name=eth2
swapper-0 [000] 49985.641140: irq_handler_entry: irq=16 name=uhci_hcd:usb3
swapper-0 [000] 49985.641141: irq_handler_entry: irq=16 name=eth2
swapper-0 [000] 49985.641186: irq_handler_entry: irq=16 name=uhci_hcd:usb3
So i thought i could finally test all the cool --filter stuff!
I wanted to trace the timer irq, so i tried:
aldebaran:~> perf record -a -e irq:irq_handler_entry --filter 'irq==0' sleep 1; perf report -D | tail -15
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.744 MB perf.data (~32519 samples) ]
TOTAL events: 10360
MMAP events: 6870
LOST events: 0
COMM events: 1205
EXIT events: 1332
THROTTLE events: 0
UNTHROTTLE events: 0
FORK events: 656
READ events: 0
SAMPLE events: 293
ATTR events: 0
EVENT_TYPE events: 0
TRACING_DATA events: 0
BUILD_ID events: 0
FINISHED_ROUND events: 4
I had to do the -D trick because otherwise I had no idea whether there were any REAL
samples recorded...
But 'perf script' output gave me:
cc1-27479 [000] 50100.959394: irq_handler_entry: irq=16 name=eth2
cc1-27479 [000] 50100.959636: irq_handler_entry: irq=16 name=uhci_hcd:usb3
cc1-27479 [000] 50100.959638: irq_handler_entry: irq=16 name=eth2
cc1-27479 [000] 50100.959711: irq_handler_entry: irq=0 name=timer
cc1-27479 [000] 50100.959880: irq_handler_entry: irq=16 name=uhci_hcd:usb3
cc1-27479 [000] 50100.959883: irq_handler_entry: irq=16 name=eth2
Hm, filtering didnt work. Lets try to filter only that irq 16 thing then:
aldebaran:~> perf record -a -e irq:irq_handler_entry --filter 'irq==16' sleep 1; perf report -D | tail -15
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.299 MB perf.data (~13053 samples) ]
TOTAL events: 4118
MMAP events: 2993
LOST events: 0
COMM events: 589
EXIT events: 144
THROTTLE events: 0
UNTHROTTLE events: 0
FORK events: 73
READ events: 0
SAMPLE events: 314
ATTR events: 0
EVENT_TYPE events: 0
TRACING_DATA events: 0
BUILD_ID events: 0
FINISHED_ROUND events: 5
Ok, this finally seems to have worked! I saw:
cc1-3833 [000] 50250.934705: irq_handler_entry: irq=16 name=uhci_hcd:usb3
cc1-3833 [000] 50250.934708: irq_handler_entry: irq=16 name=eth2
as-3834 [000] 50250.943321: irq_handler_entry: irq=16 name=uhci_hcd:usb3
as-3834 [000] 50250.943325: irq_handler_entry: irq=16 name=eth2
as-3834 [000] 50250.943367: irq_handler_entry: irq=16 name=uhci_hcd:usb3
Except that it didnt - sometimes the trace started with:
cc1-3833 [000] 50250.916784: irq_handler_entry: irq=19 name=ahci
cc1-3833 [000] 50250.916790: irq_handler_entry: irq=19 name=uhci_hcd:usb7
cc1-3833 [000] 50250.917680: irq_handler_entry: irq=0 name=timer
cc1-3833 [000] 50250.918112: irq_handler_entry: irq=19 name=ahci
cc1-3833 [000] 50250.918119: irq_handler_entry: irq=19 name=uhci_hcd:usb7
cc1-3833 [000] 50250.930152: irq_handler_entry: irq=16 name=uhci_hcd:usb3
cc1-3833 [000] 50250.930156: irq_handler_entry: irq=16 name=eth2
cc1-3833 [000] 50250.930367: irq_handler_entry: irq=16 name=uhci_hcd:usb3
cc1-3833 [000] 50250.930369: irq_handler_entry: irq=16 name=eth2
ouch - both irq 19 and irq 0 entries there - clearly not what i expected to see.
Anyway, i ignored the extra lines i did not want to see and tried to use the filter
stuff some more, to match on the 'name' field. Wanting to see both usb3 and usb7
interrupts i typed the obvious:
aldebaran:~> perf record -a -e irq:irq_handler_entry --filter 'name==uhci_hcd:usb' sleep 1
But got nothing but a (not so obvious) timer irq:
aldebaran:~> perf script
:32749-32749 [000] 50514.340198: irq_handler_entry: irq=0 name=timer
Ok, i thought "these guys must be doing this via wildcards, lets try that":
aldebaran:~> perf record -a -e irq:irq_handler_entry --filter 'name==uhci_hcd:usb*'
That didnt work either.
So ... i have to say that this tracing filter business is unusable crap from a user
POV right now and i see no reason to pull *anything* in this area until it does not
get improved to *much* better levels of usability and utility.
Nobody could *ever* have tested this with a 'naive but curious user' hat on and this
is really sad. We need to do much better!
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists