[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170619173529.GM3645@kernel.org>
Date: Mon, 19 Jun 2017 14:35:29 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Jin Yao <yao.jin@...ux.intel.com>
Cc: Jiri Olsa <jolsa@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, mingo@...radead.org,
alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
ak@...ux.intel.com, kan.liang@...el.com, yao.jin@...el.com
Subject: Re: [PATCH v2 3/3] perf report: Implement visual marker for macro
fusion in annotate
Em Mon, Jun 19, 2017 at 10:55:58AM +0800, Jin Yao escreveu:
> For marking the fused instructions clearly, This patch adds a
> line before the first instruction of pair and joins it with the
> arrow of the jump.
>
> For example, when je is selected in annotate view, the line
> before cmpl is displayed and joins the arrow of je.
>
> │ ┌──cmpl $0x0,argp_program_version_hook
> 81.93 │ │──je 20
> │ │ lock cmpxchg %esi,0x38a9a4(%rip)
> │ │↓ jne 29
> │ │↓ jmp 43
> 11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
Ok, thanks for making this per-arch! Some comments:
I think we should have this marked permanently, i.e. not just when we go
to the jump line, something like this (testing here in a t450s
broadwell, function hc_find_func, /usr/lib64/liblzma.so.5.2.2):
It is like this now, when we are not on the jne jump line:
0.71 │ mov %r14d,%r10d ▒
│ movzbl (%rdx,%r10,1),%ebp ▒
1.06 │ 70: mov (%r9,%rcx,4),%ecx ◆
77.98 │ 74: cmp %bpl,(%rbx,%r10,1) ▒
│ ↑ jne 70 ▒
0.85 │ movzbl (%rdx),%r10d ▒
0.99 │ cmp %r10b,(%rbx) ▒
I think it should be augmented to:
0.71 │ mov %r14d,%r10d ▒
│ movzbl (%rdx,%r10,1),%ebp ▒
1.06 │ 70: ┌─mov (%r9,%rcx,4),%ecx ◆
77.98 │ 74: └─cmp %bpl,(%rbx,%r10,1) ▒
│ ↑ jne 70 ▒
0.85 │ movzbl (%rdx),%r10d ▒
0.99 │ cmp %r10b,(%rbx) ▒
I.e. no arrow, the two instructions that end up as one micro-op being
connected.
And then this:
│ ┌──cmpl $0x0,argp_program_version_hook
81.93 │ │──je 20
│ │ lock cmpxchg %esi,0x38a9a4(%rip)
│ │↓ jne 29
│ │↓ jmp 43
11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
Would look better as:
│ ┌──cmpl $0x0,argp_program_version_hook
81.93 │ ├──je 20
│ │ lock cmpxchg %esi,0x38a9a4(%rip)
│ │↓ jne 29
│ │↓ jmp 43
11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
Patch below, please test/ack :-)
This was the low hanging fruit, having the:
1.06 │ 70: ┌─mov (%r9,%rcx,4),%ecx ◆
77.98 │ 74: └─cmp %bpl,(%rbx,%r10,1) ▒
Marker always there, not just when we have the cursor on top of one of
those lines remains to be coded.
But you state:
------------
Macro fusion merges two instructions to a single micro-op. Intel core
platform performs this hardware optimization under limited
circumstances.
------------
"Intel core", what about older arches, etc, don't you have to look at:
# cpudesc : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
# cpuid : GenuineIntel,6,61,4
present in the perf.data header (or in the running system, for things
like 'perf top') to make sure that this is a machine where such "macro
fusion" takes place?
- Arnaldo
diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index acba636bd165..9ef7677ae14f 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -756,8 +756,10 @@ void ui_browser__mark_fused(struct ui_browser *browser, unsigned int column,
ui_browser__gotorc(browser, end_row, column);
SLsmg_draw_hline(2);
ui_browser__gotorc(browser, end_row + 1, column - 1);
- SLsmg_draw_vline(1);
+ SLsmg_write_char(SLSMG_LTEE_CHAR);
} else {
+ ui_browser__gotorc(browser, end_row, column - 1);
+ SLsmg_write_char(SLSMG_LTEE_CHAR);
ui_browser__gotorc(browser, end_row, column);
SLsmg_draw_hline(2);
}
Powered by blists - more mailing lists