[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56F89DCD.1040202@huawei.com>
Date: Mon, 28 Mar 2016 10:58:21 +0800
From: "Wangnan (F)" <wangnan0@...wei.com>
To: pi3orama <pi3orama@....com>, Peter Zijlstra <peterz@...radead.org>
CC: <mingo@...hat.com>, <linux-kernel@...r.kernel.org>,
He Kuang <hekuang@...wei.com>,
Alexei Starovoitov <ast@...nel.org>,
"Arnaldo Carvalho de Melo" <acme@...hat.com>,
Brendan Gregg <brendan.d.gregg@...il.com>,
"Jiri Olsa" <jolsa@...nel.org>,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
Namhyung Kim <namhyung@...nel.org>,
Zefan Li <lizefan@...wei.com>
Subject: Re: [PATCH 3/5] perf core: Prepare writing into ring buffer from end
On 2016/3/28 9:58, Wangnan (F) wrote:
>
>
> On 2016/3/28 9:07, Wangnan (F) wrote:
>>
>>
>> On 2016/3/27 23:30, pi3orama wrote:
>>>
>>> 发自我的 iPhone
>>>
>>>> 在 2016年3月27日,下午11:20,Peter Zijlstra <peterz@...radead.org>
>>>> 写道:
>>>>
>>>> On Fri, Mar 25, 2016 at 10:14:36PM +0800, Wangnan (F) wrote:
>>>>>>> I think you enabled some unusual config options?
>>>> x86_64-defconfig
>>>>
>>>>>> You must enabled CONFIG_OPTIMIZE_INLINING. Now I get similar result:
>>>> It has that indeed.
>>>>
>>>>> After enabling CONFIG_OPTIMIZE_INLINING:
>>>>>
>>>>> Test its performance by calling 'close(-1)' for 3000000 times and
>>>>> use 'perf record -o /dev/null -e raw_syscalls:* test-ring-buffer' to
>>>>> capture system calls:
>>>>>
>>>>> MEAN STDVAR
>>>>> BASE 800077.1 23448.13
>>>>> RAWPERF.PRE 2465858.0 603473.70
>>>>> RAWPERF.POST 2471925.0 609437.60
>>>>>
>>>>> Considering the high stdvar, after applying this patch the
>>>>> performance
>>>>> is not change.
>>>> Why is your variance so immense? And doesn't that render the
>>>> measurements pointless?
>>>>
>>> For some unknown reason, about
>>> 10% of these results raises 2 times of normal
>>> results. Say, "normal results" are about
>>> 2200000, but those "outliers" are about
>>> 4400000 (I can't access raw data now).
>>> Variance becomes much smaller if I remove
>>> those outliers.
>>
>
> Find the reason of these outliners.
>
> If perf and 'test-ring-buffer' are scheduled on different processors,
> the performance is bad. I think cache is the main reason.
>
> I will redo the test, bind them to cores on same CPU.
>
> Thank you.
Test method improvements:
1. Set CPU freq:
# for f in /sys/devices/system/cpu/cpufreq/policy*/scaling_governor ;
do echo performance > $f ; done
2. Bind core:
Add following code into head of test-ring-buffer:
CPU_ZERO(&mask);
CPU_SET(6, &mask);
pthread_setaffinity_np(pthread_self(), sizeof(mask), &mask);
pthread_yield();
3. Bind core (perf):
Use following command to start perf:
# taskset -c 7 ./perf record -o /dev/null --no-buildid-cache -e
raw_syscalls:* test-ring-buffer
New result of 100 test data in both cases:
MEAN STDVAR
BASE 800214.950 2853.083
RAWPERF.PRE 2253846.700 9997.014
RAWPERF.POST 2257495.540 8516.293
Thank you.
Powered by blists - more mailing lists