linux-kernel - Re: [RFCv2 00/48] perf tools: Add threads to record command

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c37d1c7d-9f98-4aed-e621-6b3b76d3d3d9@linux.intel.com>
Date:   Mon, 24 Sep 2018 16:09:09 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Jiri Olsa <jolsa@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Andi Kleen <andi@...stfloor.org>
Subject: Re: [RFCv2 00/48] perf tools: Add threads to record command

Hi,

On 24.09.2018 10:02, Alexey Budankov wrote:
> Hi,
> 
> On 23.09.2018 22:30, Jiri Olsa wrote:
>> On Fri, Sep 21, 2018 at 09:13:08AM +0300, Alexey Budankov wrote:
>>
>> SNIP
>>
>>> Events:
>>> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
>>> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
>>> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
>>> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
>>> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
>>> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
>>>
>>> =================================================
>>>
>>> Command:
>>> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
>>> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>>         -e cpu/period=P,event=0x3c/Duk,\
>>>            cpu/period=P,umask=0x3/Duk,\
>>>            cpu/period=P,event=0xc0/Duk,\
>>>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>>>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>>>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>>>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
>>
>> hum, so I guess the results suck because of the -a option,
>> getting extra samples for all the perf record threads
>>
>> could you try without the -a? you monitor only user events,
>> so you're interested only in ./matrix.* samples, right?
> 
> Ok, trying without -a, in per-process mode. 

Command:

/usr/bin/time ./perf.thr record --threads=T \
	-N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
	-e cpu/period=P,event=0x3c/Duk,\
	   cpu/period=P,umask=0x3/Duk,\
	   cpu/period=P,event=0xc0/Duk,\
	   cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
	   cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
	   cpu/period=0x11171,event=0xc2,umask=0x40/uk \
	--clockid=monotonic_raw -- ./matrix.gcc

Workload: matrix multiplication in 128 threads

T : 272
	P (period, ms)       : 0.35 
	runtime overhead (%) : 13x ~ 87.73 / 6.81
	data loss (%)        : 0
	LOST events          : 36
	SAMPLE events        : 8048542
        perf.data size (GiB) : 10

T : 128
	P (period, ms)       : 0.35 
	runtime overhead (%) : 10x ~ 71.12 / 6.81
	data loss (%)        : 0
	LOST events          : 2
	SAMPLE events        : 6524363
        perf.data size (GiB) : 8

T : 64
	P (period, ms)       : 0.35 
	runtime overhead (%) : 10x ~ 71.89 / 6.81
	data loss (%)        : 0
	LOST events          : 2
	SAMPLE events        : 7160623
        perf.data size (GiB) : 9

=================================================

Command:

/usr/bin/time ./perf.aio record --aio=N \
	-N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
	-e cpu/period=P,event=0x3c/Duk,\
	   cpu/period=P,umask=0x3/Duk,\
           cpu/period=P,event=0xc0/Duk,\
           cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
           cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
           cpu/period=0x11171,event=0xc2,umask=0x40/uk \
        --clockid=monotonic_raw ./matrix.gcc

Workload: matrix multiplication in 128 threads

N : 512
        P (period, ms)       : 1.5
 	runtime overhead (%) : 2.8x ~ 19.20 / 6.81
 	data loss (%)        : 0
 	LOST events          : 0
 	SAMPLE events        : 1094976
        perf.data size (GiB) : 1.3

N : 272
  	P (period, ms)       : 1.5
 	runtime overhead (%) : 3.3x ~ 22.34 / 6.81
 	data loss (%)        : 0
 	LOST events          : 0
 	SAMPLE events        : 1089252
        perf.data size (GiB) : 1.3
  
N : 128
 	P (period, ms)       : 1.5
 	runtime overhead (%) : 2.6x ~ 15.15 / 6.81
 	data loss (%)        : 1
 	LOST events          : 1
 	SAMPLE events        : 1094102
        perf.data size (GiB) : 1.3
 
N : 64
 	P (period, ms)       : 1.5
 	runtime overhead (%) : 2.4x ~ 16.23 / 6.81
 	data loss (%)        : 2
 	LOST events          : 18
 	SAMPLE events        : 1105986
        perf.data size (GiB) : 1.3

Thanks,
Alexey

> VTune collects as user as kernel mode samples, using /uk modifiers set.
> The set can be extended to collect in VM host and guests as well.
> 
> Thanks,
> Alexey
> 
>>
>> thanks,
>> jirka
>>
>