[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56FB3C3D.3050400@huawei.com>
Date: Wed, 30 Mar 2016 10:38:53 +0800
From: "Wangnan (F)" <wangnan0@...wei.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Alexei Starovoitov <ast@...nel.org>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
<linux-kernel@...r.kernel.org>,
Brendan Gregg <brendan.d.gregg@...il.com>,
He Kuang <hekuang@...wei.com>, Jiri Olsa <jolsa@...nel.org>,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
Namhyung Kim <namhyung@...nel.org>, <pi3orama@....com>,
Zefan Li <lizefan@...wei.com>
Subject: Re: [PATCH 4/4] perf core: Add backward attribute to perf event
On 2016/3/30 10:28, Wangnan (F) wrote:
>
>
> On 2016/3/29 22:04, Peter Zijlstra wrote:
>> On Mon, Mar 28, 2016 at 06:41:32AM +0000, Wang Nan wrote:
>>
>> Could you maybe write a perf/tests thingy for this so that _some_
>> userspace exists that exercises this new code?
>>
>>
>>> int perf_output_begin(struct perf_output_handle *handle,
>>> struct perf_event *event, unsigned int size)
>>> {
>>> + if (unlikely(is_write_backward(event)))
>>> + return __perf_output_begin(handle, event, size, true);
>>> return __perf_output_begin(handle, event, size, false);
>>> }
>> Would something like:
>>
>> int perf_output_begin(...)
>> {
>> if (unlikely(is_write_backward(event))
>> return perf_output_begin_backward(...);
>> return perf_output_begin_forward(...);
>> }
>>
>> make sense; I'm not sure how much is still using this, but it seems
>> somewhat excessive to inline two copies of that thing into a single
>> function.
>
> perf_output_begin is useful:
>
> $ grep perf_output_begin ./kernel -r
> ./kernel/events/ring_buffer.c: * See perf_output_begin().
> ./kernel/events/ring_buffer.c:int perf_output_begin(struct
> perf_output_handle *handle,
> ./kernel/events/ring_buffer.c: * perf_output_begin() only checks
> rb->paused, therefore
> ./kernel/events/core.c: if (perf_output_begin(&handle, event,
> header.size))
> ./kernel/events/core.c: ret = perf_output_begin(&handle, event,
> read_event.header.size);
> ./kernel/events/core.c: ret = perf_output_begin(&handle, event,
> ./kernel/events/core.c: ret = perf_output_begin(&handle, event,
> ./kernel/events/core.c: ret = perf_output_begin(&handle, event,
> ./kernel/events/core.c: ret = perf_output_begin(&handle, event,
> rec.header.size);
> ./kernel/events/core.c: ret = perf_output_begin(&handle, event,
> ./kernel/events/core.c: ret = perf_output_begin(&handle, event,
> se->event_id.header.size);
> ./kernel/events/core.c: ret = perf_output_begin(&handle, event,
> ./kernel/events/core.c: ret = perf_output_begin(&handle, event,
> rec.header.size);
>
> Events like PERF_RECORD_MMAP2 uses this function, so we still need to
> consider its overhead.
>
> So I will use your first suggestion.
>
Sorry. Your second suggestion seems also good:
My implementation makes a big perf_output_begin(), but introduces only
one load and one branch.
Your first suggestion introduces one load, one branch and one function call.
Your second suggestion introduces one load, and at least one (and at
most three) branches.
I need some benchmarking result.
Thank you.
Powered by blists - more mailing lists