lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ac85264-50a5-8e70-0c12-7cb0da433a42@linux.intel.com>
Date:   Mon, 24 Sep 2018 21:32:11 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Jiri Olsa <jolsa@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Andi Kleen <andi@...stfloor.org>
Subject: Re: [RFCv2 00/48] perf tools: Add threads to record command

Hi,

On 24.09.2018 17:29, Jiri Olsa wrote:
> On Mon, Sep 24, 2018 at 04:09:09PM +0300, Alexey Budankov wrote:
>> Hi,
>>
>> On 24.09.2018 10:02, Alexey Budankov wrote:
>>> Hi,
>>>
>>> On 23.09.2018 22:30, Jiri Olsa wrote:
>>>> On Fri, Sep 21, 2018 at 09:13:08AM +0300, Alexey Budankov wrote:
>>>>
>>>> SNIP
>>>>
>>>>> Events:
>>>>> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
>>>>> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
>>>>> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
>>>>> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
>>>>> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
>>>>> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
>>>>>
>>>>> =================================================
>>>>>
>>>>> Command:
>>>>> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
>>>>> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>>>>         -e cpu/period=P,event=0x3c/Duk,\
>>>>>            cpu/period=P,umask=0x3/Duk,\
>>>>>            cpu/period=P,event=0xc0/Duk,\
>>>>>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>>>>>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>>>>>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>>>>>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
>>>>
>>>> hum, so I guess the results suck because of the -a option,
>>>> getting extra samples for all the perf record threads
>>>>
>>>> could you try without the -a? you monitor only user events,
>>>> so you're interested only in ./matrix.* samples, right?
>>>
>>> Ok, trying without -a, in per-process mode. 
>>
>> Command:
>>
>> /usr/bin/time ./perf.thr record --threads=T \
>> 	-N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>> 	-e cpu/period=P,event=0x3c/Duk,\
>> 	   cpu/period=P,umask=0x3/Duk,\
>> 	   cpu/period=P,event=0xc0/Duk,\
>> 	   cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
>> 	   cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
>> 	   cpu/period=0x11171,event=0xc2,umask=0x40/uk \
>> 	--clockid=monotonic_raw -- ./matrix.gcc
>>
>> Workload: matrix multiplication in 128 threads
>>
>> T : 272
>> 	P (period, ms)       : 0.35 
>> 	runtime overhead (%) : 13x ~ 87.73 / 6.81
> 
> how do you meassure this?

This is the ratio of elapsed times:
runtime overhead (%) : elapsed_time_under_profiling / elapsed_time
i.e.

/usr/bin/time ./matrix.gcc
...
767.03user 11.17system 0:06.81elapsed 11424%CPU (0avgtext+0avgdata 100756maxresident)k
88inputs+0outputs (0major+139898minor)pagefaults 0swaps

so elapsed_time = 6.81 sec

elapsed_time_uder_profiling is elapsed value from output of 

/usr/bin/time ./perf.thr record --threads=T ...

> 
>> 	data loss (%)        : 0
>> 	LOST events          : 36
>> 	SAMPLE events        : 8048542
>>         perf.data size (GiB) : 10
> 
> any idea why does it have some much more samples?

Presumably, this is because period is 350us and this is the smallest 
one that perf.thr manages to capture data without data loss (=0) when T=272.
However, during collection, I get message that max sampling frequency 
is lowered to 3KHz.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ