linux-kernel - Re: [PATCH 4/4] perf core: Add backward attribute to perf event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5703C612.1080608@huawei.com>
Date:	Tue, 5 Apr 2016 22:05:06 +0800
From:	"Wangnan (F)" <wangnan0@...wei.com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	Alexei Starovoitov <ast@...nel.org>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	<linux-kernel@...r.kernel.org>,
	Brendan Gregg <brendan.d.gregg@...il.com>,
	He Kuang <hekuang@...wei.com>, Jiri Olsa <jolsa@...nel.org>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Namhyung Kim <namhyung@...nel.org>, <pi3orama@....com>,
	Zefan Li <lizefan@...wei.com>
Subject: Re: [PATCH 4/4] perf core: Add backward attribute to perf event



On 2016/3/30 10:38, Wangnan (F) wrote:
>
>
> On 2016/3/30 10:28, Wangnan (F) wrote:
>>
>>
>> On 2016/3/29 22:04, Peter Zijlstra wrote:
>>> On Mon, Mar 28, 2016 at 06:41:32AM +0000, Wang Nan wrote:
>>>
>>> Could you maybe write a perf/tests thingy for this so that _some_
>>> userspace exists that exercises this new code?
>>>
>>>
>>>>   int perf_output_begin(struct perf_output_handle *handle,
>>>>                 struct perf_event *event, unsigned int size)
>>>>   {
>>>> +    if (unlikely(is_write_backward(event)))
>>>> +        return __perf_output_begin(handle, event, size, true);
>>>>       return __perf_output_begin(handle, event, size, false);
>>>>   }
>>> Would something like:
>>>
>>> int perf_output_begin(...)
>>> {
>>>     if (unlikely(is_write_backward(event))
>>>         return perf_output_begin_backward(...);
>>>     return perf_output_begin_forward(...);
>>> }
>>>
>>> make sense; I'm not sure how much is still using this, but it seems
>>> somewhat excessive to inline two copies of that thing into a single
>>> function.
>>
>>

[SNIP]

>
> Sorry. Your second suggestion seems also good:
>
> My implementation makes a big perf_output_begin(), but introduces only 
> one load and one branch.
>
> Your first suggestion introduces one load, one branch and one function 
> call.
>
> Your second suggestion introduces one load, and at least one (and at 
> most three) branches.
>
> I need some benchmarking result.
>
> Thank you.

No obviously performance divergence among all 3 implementations.

Here are some numbers:

I tested the cost of generating PERF_RECORD_COMM event using prctl with
following code:

         ...
         gettimeofday(&tv1, NULL);
         for (i = 0; i < 1000 * 1000 * 3; i++) {
                 char proc_name[10];

                 snprintf(proc_name, sizeof(proc_name), "p:%d\n", i);
                 prctl(PR_SET_NAME, proc_name);
         }
         gettimeofday(&tv2, NULL);
         us1 = tv1.tv_sec * 1000000 + tv1.tv_usec;
         us2 = tv2.tv_sec * 1000000 + tv2.tv_usec;
         printf("%ld\n", us2 - us1);
         ...

Run this benchmark 100 time in each experiment. Bind benchmark to core 2
and perf to core 1 to ensure they are on a same CPU.

Result:

BASE    : execute without perf
4.5     : pure v4.5
TIP     : with only patch 1-3/4 in this patch set applied
BIGFUNC : the implementation in my original patch
FUNCCALL: the implememtation in Peter's first suggestion:
    int perf_output_begin(...)
    {
        if (unlikely(is_write_backward(event))
            return perf_output_begin_backward(...);
        return perf_output_begin_forward(...);
    }
BRANCH : the implememtation in Peter's second suggestion:
     int perf_output_begin(...)
     {
         return __perf_output_begin(..., unlikely(event->attr.backwards));
     }


'perf' is executed using:
  # perf record -o /dev/null --no-buildid-cache -e 
syscalls:sys_enter_read ...


Results:

              MEAN       STDVAR
BASE    : 1122968.85   33492.52
4.5     : 2714200.70   26231.69
TIP     : 2646260.46   32610.56
BIGFUNC : 2661308.46   52707.47
FUNCCALL: 2636061.10   52607.80
BRANCH  : 2651335.74   34910.04


Considering the stdvar, the performance result is nearly identical.

I'd like to choose 'BRANCH' because its code looks better.

Thank you.