[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171031094053.iblfii2hzz7keujh@gmail.com>
Date: Tue, 31 Oct 2017 10:40:54 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Jiri Olsa <jolsa@...nel.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
lkml <linux-kernel@...r.kernel.org>,
Namhyung Kim <namhyung@...nel.org>,
David Ahern <dsahern@...il.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH 5/7] perf tools: Optimize sample parsing for ordered
events
* Jiri Olsa <jolsa@...nel.org> wrote:
> Currently when using ordered events we parse the sample
> twice (the perf_evlist__parse_sample function). Once
> before we queue the sample for sorting:
>
> perf_session__process_event
> perf_evlist__parse_sample(sample)
> perf_session__queue_event(sample.time)
>
> And then when we deliver the sorted sample:
>
> ordered_events__deliver_event
> perf_evlist__parse_sample
> perf_session__deliver_event
>
> We can skip the initial full sample parsing by using
> perf_evlist__parse_sample_timestamp function, which
> got introduced earlier. The new path looks like:
>
> perf_session__process_event
> perf_evlist__parse_sample_timestamp
> perf_session__queue_event
>
> ordered_events__deliver_event
> perf_session__deliver_event
> perf_evlist__parse_sample
>
> It saves some instructions and is slightly faster:
>
> Before:
> Performance counter stats for './perf.old report --stdio' (5 runs):
>
> 64,396,007,225 cycles:u ( +- 0.97% )
> 105,882,112,735 instructions:u # 1.64 insn per cycle ( +- 0.00% )
>
> 21.618103465 seconds time elapsed ( +- 1.12% )
>
> After:
> Performance counter stats for './perf report --stdio' (5 runs):
>
> 60,567,807,182 cycles:u ( +- 0.40% )
> 104,853,333,514 instructions:u # 1.73 insn per cycle ( +- 0.00% )
>
> 20.168895243 seconds time elapsed ( +- 0.32% )
That's a 7% speedup, not bad!
Thanks,
Ingo
Powered by blists - more mailing lists