linux-kernel - Re: [RFCv2 00/48] perf tools: Add threads to record command

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3227b88a-5607-596c-3ade-74e0b21988e6@linux.intel.com>
Date:   Mon, 24 Sep 2018 22:12:15 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Jiri Olsa <jolsa@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Andi Kleen <andi@...stfloor.org>
Subject: Re: [RFCv2 00/48] perf tools: Add threads to record command

Hi,

On 24.09.2018 21:32, Alexey Budankov wrote:
> Hi,
> 
> On 24.09.2018 17:29, Jiri Olsa wrote:
>> On Mon, Sep 24, 2018 at 04:09:09PM +0300, Alexey Budankov wrote:
>>> Hi,
>>>
>>> On 24.09.2018 10:02, Alexey Budankov wrote:
>>>> Hi,
>>>>
>>>> On 23.09.2018 22:30, Jiri Olsa wrote:
>>>>> On Fri, Sep 21, 2018 at 09:13:08AM +0300, Alexey Budankov wrote:
>>>>>
>>>>> SNIP
>>>>>
>>>>>> Events:
>>>>>> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
>>>>>> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
>>>>>> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
>>>>>> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
>>>>>> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
>>>>>> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
>>>>>>
>>>>>> =================================================
>>>>>>
>>>>>> Command:
>>>>>> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
>>>>>> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>>>>>         -e cpu/period=P,event=0x3c/Duk,\
>>>>>>            cpu/period=P,umask=0x3/Duk,\
>>>>>>            cpu/period=P,event=0xc0/Duk,\
>>>>>>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>>>>>>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>>>>>>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>>>>>>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
>>>>>
>>>>> hum, so I guess the results suck because of the -a option,
>>>>> getting extra samples for all the perf record threads
>>>>>
>>>>> could you try without the -a? you monitor only user events,
>>>>> so you're interested only in ./matrix.* samples, right?
>>>>
>>>> Ok, trying without -a, in per-process mode. 
>>>
>>> Command:
>>>
>>> /usr/bin/time ./perf.thr record --threads=T \
>>> 	-N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>> 	-e cpu/period=P,event=0x3c/Duk,\
>>> 	   cpu/period=P,umask=0x3/Duk,\
>>> 	   cpu/period=P,event=0xc0/Duk,\
>>> 	   cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
>>> 	   cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
>>> 	   cpu/period=0x11171,event=0xc2,umask=0x40/uk \
>>> 	--clockid=monotonic_raw -- ./matrix.gcc
>>>
>>> Workload: matrix multiplication in 128 threads
>>>
>>> T : 272
>>> 	P (period, ms)       : 0.35 
>>> 	runtime overhead (%) : 13x ~ 87.73 / 6.81
>>
>> how do you meassure this?
> 
> This is the ratio of elapsed times:
> runtime overhead (%) : elapsed_time_under_profiling / elapsed_time
> i.e.
> 
> /usr/bin/time ./matrix.gcc
> ...
> 767.03user 11.17system 0:06.81elapsed 11424%CPU (0avgtext+0avgdata 100756maxresident)k
> 88inputs+0outputs (0major+139898minor)pagefaults 0swaps
> 
> so elapsed_time = 6.81 sec
> 
> elapsed_time_uder_profiling is elapsed value from output of 
> 
> /usr/bin/time ./perf.thr record --threads=T ...
> 
>>
>>> 	data loss (%)        : 0
>>> 	LOST events          : 36
>>> 	SAMPLE events        : 8048542
>>>         perf.data size (GiB) : 10
>>
>> any idea why does it have some much more samples?
> 
> Presumably, this is because period is 350us and this is the smallest 
> one that perf.thr manages to capture data without data loss (=0) when T=272.
> However, during collection, I get message that max sampling frequency 
> is lowered to 3KHz.

Lowering default frequency rate to 3000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.

Thanks,
Alexey

> 
> Thanks,
> Alexey
> 
>>
>> thanks,
>> jirka
>>
>