linux-kernel - Re: [PATCH v5 0/4] Reduce NUMA related overhead in perf record profiling on large server systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cbe9780b-e5fa-8abc-11da-4dbbc9593128@linux.intel.com>
Date:   Thu, 31 Jan 2019 12:52:54 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Jiri Olsa <jolsa@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Andi Kleen <ak@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 0/4] Reduce NUMA related overhead in perf record
 profiling on large server systems

On 28.01.2019 14:27, Jiri Olsa wrote:
> On Tue, Jan 22, 2019 at 08:45:12PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> The patch set has been validated on BT benchmark from NAS Parallel 
>> Benchmarks [2] running on dual socket, 44 cores, 88 hw threads Broadwell 
>> system with kernels v4.4-21-generic (Ubuntu 16.04) and v4.20.0-rc5 
>> (tip perf/core). 
>>
>> The patch set is for Arnaldo's perf/core repository.
>>
>> OVERHEAD:
>> 			       BENCH REPORT BASED   ELAPSED TIME BASED
>> 	  v4.20.0-rc5 
>>           (tip perf/core):
>> 				
>> (current) SERIAL-SYS  / BASE : 1.27x (14.37/11.31), 1.29x (15.19/11.69)
>> 	  SERIAL-NODE / BASE : 1.15x (13.04/11.31), 1.17x (13.79/11.69)
>> 	  SERIAL-CPU  / BASE : 1.00x (11.32/11.31), 1.01x (11.89/11.69)
>> 	
>> 	  AIO1-SYS    / BASE : 1.29x (14.58/11.31), 1.29x (15.26/11.69)
>> 	  AIO1-NODE   / BASE : 1.08x (12.23/11.31), 1,11x (13.01/11.69)
>> 	  AIO1-CPU    / BASE : 1.07x (12.14/11.31), 1.08x (12.83/11.69)
>>
>> 	  v4.4.0-21-generic
>>           (Ubuntu 16.04 LTS):
>>
>> (current) SERIAL-SYS  / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
>> 	  SERIAL-NODE / BASE : 1.19x (13.02/10.87), 1.23x (14.03/11.32)
>> 	  SERIAL-CPU  / BASE : 1.03x (11.21/10.87), 1.07x (12.18/11.32)
>> 	
>> 	  AIO1-SYS    / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
>> 	  AIO1-NODE   / BASE : 1.10x (12.04/10.87), 1.15x (13.03/11.32)
>> 	  AIO1-CPU    / BASE : 1.12x (12.20/10.87), 1.15x (13.09/11.32)
>>
>> ---
>> Alexey Budankov (4):
>>   perf record: allocate affinity masks
>>   perf record: bind the AIO user space buffers to nodes
>>   perf record: apply affinity masks when reading mmap buffers
>>   perf record: implement --affinity=node|cpu option
>>
>>  tools/perf/Documentation/perf-record.txt |   5 ++
>>  tools/perf/builtin-record.c              |  45 +++++++++-
>>  tools/perf/perf.h                        |   8 ++
>>  tools/perf/util/cpumap.c                 |  10 +++
>>  tools/perf/util/cpumap.h                 |   1 +
>>  tools/perf/util/evlist.c                 |   6 +-
>>  tools/perf/util/evlist.h                 |   2 +-
>>  tools/perf/util/mmap.c                   | 105 ++++++++++++++++++++++-
>>  tools/perf/util/mmap.h                   |   3 +-
>>  9 files changed, 175 insertions(+), 10 deletions(-)
>>
>> ---
>> Changes in v5:
>> - avoided multiple allocations of online cpu maps by 
>>   implementing it once in cpu_map__online()
>> - reduced indentation at record__parse_affinity()
> 
> Reviewed-by: Jiri Olsa <jolsa@...nel.org>

Thanks! 
Alexey

> 
> thanks,
> jirka
>