[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5703C612.1080608@huawei.com>
Date: Tue, 5 Apr 2016 22:05:06 +0800
From: "Wangnan (F)" <wangnan0@...wei.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Alexei Starovoitov <ast@...nel.org>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
<linux-kernel@...r.kernel.org>,
Brendan Gregg <brendan.d.gregg@...il.com>,
He Kuang <hekuang@...wei.com>, Jiri Olsa <jolsa@...nel.org>,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
Namhyung Kim <namhyung@...nel.org>, <pi3orama@....com>,
Zefan Li <lizefan@...wei.com>
Subject: Re: [PATCH 4/4] perf core: Add backward attribute to perf event
On 2016/3/30 10:38, Wangnan (F) wrote:
>
>
> On 2016/3/30 10:28, Wangnan (F) wrote:
>>
>>
>> On 2016/3/29 22:04, Peter Zijlstra wrote:
>>> On Mon, Mar 28, 2016 at 06:41:32AM +0000, Wang Nan wrote:
>>>
>>> Could you maybe write a perf/tests thingy for this so that _some_
>>> userspace exists that exercises this new code?
>>>
>>>
>>>> int perf_output_begin(struct perf_output_handle *handle,
>>>> struct perf_event *event, unsigned int size)
>>>> {
>>>> + if (unlikely(is_write_backward(event)))
>>>> + return __perf_output_begin(handle, event, size, true);
>>>> return __perf_output_begin(handle, event, size, false);
>>>> }
>>> Would something like:
>>>
>>> int perf_output_begin(...)
>>> {
>>> if (unlikely(is_write_backward(event))
>>> return perf_output_begin_backward(...);
>>> return perf_output_begin_forward(...);
>>> }
>>>
>>> make sense; I'm not sure how much is still using this, but it seems
>>> somewhat excessive to inline two copies of that thing into a single
>>> function.
>>
>>
[SNIP]
>
> Sorry. Your second suggestion seems also good:
>
> My implementation makes a big perf_output_begin(), but introduces only
> one load and one branch.
>
> Your first suggestion introduces one load, one branch and one function
> call.
>
> Your second suggestion introduces one load, and at least one (and at
> most three) branches.
>
> I need some benchmarking result.
>
> Thank you.
No obviously performance divergence among all 3 implementations.
Here are some numbers:
I tested the cost of generating PERF_RECORD_COMM event using prctl with
following code:
...
gettimeofday(&tv1, NULL);
for (i = 0; i < 1000 * 1000 * 3; i++) {
char proc_name[10];
snprintf(proc_name, sizeof(proc_name), "p:%d\n", i);
prctl(PR_SET_NAME, proc_name);
}
gettimeofday(&tv2, NULL);
us1 = tv1.tv_sec * 1000000 + tv1.tv_usec;
us2 = tv2.tv_sec * 1000000 + tv2.tv_usec;
printf("%ld\n", us2 - us1);
...
Run this benchmark 100 time in each experiment. Bind benchmark to core 2
and perf to core 1 to ensure they are on a same CPU.
Result:
BASE : execute without perf
4.5 : pure v4.5
TIP : with only patch 1-3/4 in this patch set applied
BIGFUNC : the implementation in my original patch
FUNCCALL: the implememtation in Peter's first suggestion:
int perf_output_begin(...)
{
if (unlikely(is_write_backward(event))
return perf_output_begin_backward(...);
return perf_output_begin_forward(...);
}
BRANCH : the implememtation in Peter's second suggestion:
int perf_output_begin(...)
{
return __perf_output_begin(..., unlikely(event->attr.backwards));
}
'perf' is executed using:
# perf record -o /dev/null --no-buildid-cache -e
syscalls:sys_enter_read ...
Results:
MEAN STDVAR
BASE : 1122968.85 33492.52
4.5 : 2714200.70 26231.69
TIP : 2646260.46 32610.56
BIGFUNC : 2661308.46 52707.47
FUNCCALL: 2636061.10 52607.80
BRANCH : 2651335.74 34910.04
Considering the stdvar, the performance result is nearly identical.
I'd like to choose 'BRANCH' because its code looks better.
Thank you.
Powered by blists - more mailing lists