linux-kernel - Re: [PATCH v3 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <acf891b3-c055-6eb5-c553-e76a176a41d8@linux.intel.com>
Date:   Tue, 28 Aug 2018 17:08:15 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 0/2]: perf: reduce data loss when profiling highly
 parallel CPU bound workloads

Hi,

On 28.08.2018 11:59, Jiri Olsa wrote:
> On Mon, Aug 27, 2018 at 08:03:21PM +0300, Alexey Budankov wrote:
>>
>> Currently in record mode the tool implements trace writing serially. 
>> The algorithm loops over mapped per-cpu data buffers and stores ready 
>> data chunks into a trace file using write() system call.
>>
>> At some circumstances the kernel may lack free space in a buffer 
>> because the other buffer's half is not yet written to disk due to 
>> some other buffer's data writing by the tool at the moment.
>>
>> Thus serial trace writing implementation may cause the kernel 
>> to loose profiling data and that is what observed when profiling 
>> highly parallel CPU bound workloads on machines with big number 
>> of cores.
>>
>> Experiment with profiling matrix multiplication code executing 128 
>> threads on Intel Xeon Phi (KNM) with 272 cores, like below,
>> demonstrates data loss metrics value of 98%:
>>
>> /usr/bin/time perf record -o /tmp/perf-ser.data -a -N -B -T -R -g \
>>     --call-graph dwarf,1024 --user-regs=IP,SP,BP \
>>     --switch-events -e cycles,instructions,ref-cycles,software/period=1,name=cs,config=0x3/Duk -- \
>>     matrix.gcc
>>
>> Data loss metrics is the ratio lost_time/elapsed_time where 
>> lost_time is the sum of time intervals containing PERF_RECORD_LOST 
>> records and elapsed_time is the elapsed application run time 
>> under profiling.
> 
> I like the idea and I think it's good direction to go, but could
> you please share some from perf stat or whatever you used to meassure
> the new performance?

I am not sure it would be representative in perf stat data however 
that could be visible there as well. I could share screenshots of 
VTune GUI that demonstrate the advantage of Perf implementing AIO 
streaming in comparison with the serial streaming. Data loss metrics 
can be easily understood from that. Is it ok to post it here?

> 
> thanks,
> jirka
>