[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151021084816.GA12992@gmail.com>
Date: Wed, 21 Oct 2015 10:48:16 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Arnaldo Carvalho de Melo <acme@...hat.com>
Cc: "Wangnan (F)" <wangnan0@...wei.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
linux-kernel@...r.kernel.org,
Adrian Hunter <adrian.hunter@...el.com>,
Borislav Petkov <bp@...e.de>,
Chandler Carruth <chandlerc@...il.com>,
David Ahern <dsahern@...il.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
Stephane Eranian <eranian@...gle.com>,
pi3orama <pi3orama@....com>
Subject: Re: [PATCH 13/16] perf callchain: Switch default to
'graph,0.5,caller'
* Arnaldo Carvalho de Melo <acme@...hat.com> wrote:
> I was bitten by the --children thing and took some time to get used to it, so I
> can relate to that...
>
> I think we should revert this change in callchain default, enough complaints...
> Ingo, since you suggested that change, what are your thoughts?
Btw., one side note, I noticed that the call-graph options to 'perf top' do not
match that of perf report. I tried for a couple of minutes to figure out why this
doesn't work:
perf top -g graph,0.5,caller
... only to notice that it's perf report options.
A couple of thoughts about defaults:
1)
I think 'perf top' and 'perf report' should provide the very same output by
default. The two tools are unified, and we should think of 'perf top' more of a
rolling, continuously updated perf report, with some dynamic runtime features that
go beyond a simple perf report. Making them diverge only creates confusion.
2) min-percentage
I suspect the '0.5%' part of the default is not contested by anyone?
3) 'graph' vs. 'fractal'
The 'graph' part of the default: I think 'graph' (absolute percentages) is more
intuitive in general than 'fractal' (relative percentages), especially when
drilling down deep into more complex call graphs.
For example, if you look at this output:
| | | |--41.61%-- local_apic_timer_interrupt
| | | | |
| | | | --100.00%-- hrtimer_interrupt
| | | | __run_hrtimer
| | | | |
| | | | |--72.98%-- hrtimer_wakeup
| | | | | wake_up_process
| | | | | |
| | | | | --100.00%-- try_to_wake_up
| | | | | ttwu_do_activate.constprop.93
| | | | | activate_task
| | | | | enqueue_task
| | | | | enqueue_task_fair
| | | | | enqueue_entity
| | | | |
| | | | --27.02%-- ehci_hrtimer_func
Would you have guessed that its relevance in reality is:
| | | |
| | | |--0.11%-- local_apic_timer_interrupt
| | | | |
| | | | --0.10%-- hrtimer_interrupt
| | | | __run_hrtimer
| | | | |
| | | | |--0.07%-- hrtimer_wakeup
| | | | | wake_up_process
| | | | | |
| | | | | --0.01%-- try_to_wake_up
| | | | | ttwu_do_activate.constprop.93
| | | | | activate_task
| | | | | enqueue_task
| | | | | enqueue_task_fair
| | | | | enqueue_entity
| | | | |
| | | | --0.03%-- ehci_hrtimer_func
| | | |
?
I think the 'big picture' should always be apparent, even when looking at a small
detail. Also, it's not _that_ hard to see the relative weight of each entry even
if they are small numbers.
Fractal output can be useful if you are trying to drill down really, really deep
and only concentrate on that aspect - but that kind of workflow is probably best
served via a search option in any case:
perf report --call-graph fractal,0.5,caller --stdio --symbol-filter local_apic_timer_interrupt
In which case fractal output is the more intuitive one I suspect:
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:pp'
# Event count (approx.): 1155803425
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ..............................
#
0.11% 0.01% swapper [kernel.kallsyms] [k] local_apic_timer_interrupt
|
|--89.19%-- local_apic_timer_interrupt
| hrtimer_interrupt
| __run_hrtimer
| |
| |--72.98%-- hrtimer_wakeup
| | wake_up_process
| | |
| | --100.00%-- try_to_wake_up
| | ttwu_do_activate.constprop.93
| | activate_task
| | enqueue_task
| | enqueue_task_fair
| | enqueue_entity
| |
| --27.02%-- ehci_hrtimer_func
|
--10.81%-- start_secondary
cpu_startup_entry
cpuidle_enter
apic_timer_interrupt
smp_apic_timer_interrupt
local_apic_timer_interrupt
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
Btw., I noticed an oddity, why doesn't "-S local_apic_timer_interrupt" produce any
output? It was the first option I tried, and it only gave me:
triton:~/tip> perf report --call-graph fractal,0.5,caller --stdio -S local_apic_timer_interrupt
[nv] with build id 744b5b4279152a54e61208989daf5d3d6b375aa3 not found, continuing without symbols
Failed to open /tmp/perf-6650.map, continuing without symbols
# To display the perf.data header info, please use --header/--header-only options.
#
# symbol: local_apic_timer_interrupt
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:pp'
# Event count (approx.): 1155803425
#
# Children Self Command Shared Object
# ........ ........ ....... .............
#
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
some symbols could not be found - but the output is pretty confusing and
misleading in outputing just empty headers, plus it doesn't explain why it does
so.
4) 'caller' vs. 'callee'.
If I change 'caller' to 'callee' in the above example, I get this output:
triton:~/tip> perf report --call-graph fractal,0.5,callee --stdio --symbol-filter local_apic_timer_interrupt
[nv] with build id 744b5b4279152a54e61208989daf5d3d6b375aa3 not found, continuing without symbols
Failed to open /tmp/perf-6650.map, continuing without symbols
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:pp'
# Event count (approx.): 1155803425
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ..............................
#
0.11% 0.01% swapper [kernel.kallsyms] [k] local_apic_timer_interrupt
|
---local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
cpu_startup_entry
start_secondary
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
That does not look very helpful, does it?
Now I tried to test caller vs. callee in perf top - but couldn't find a command
line option to do it - is there any?
5) --no-children
I agree that 'perf top -g --no-children' looks more intuitive than 'perf top -g'.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists