linux-kernel - Re: [PATCH v5 0/4] Reduce NUMA related overhead in perf record profiling on large server systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190128112748.GC15461@krava>
Date:   Mon, 28 Jan 2019 12:27:48 +0100
From:   Jiri Olsa <jolsa@...hat.com>
To:     Alexey Budankov <alexey.budankov@...ux.intel.com>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Andi Kleen <ak@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 0/4] Reduce NUMA related overhead in perf record
 profiling on large server systems

On Tue, Jan 22, 2019 at 08:45:12PM +0300, Alexey Budankov wrote:

SNIP

> The patch set has been validated on BT benchmark from NAS Parallel 
> Benchmarks [2] running on dual socket, 44 cores, 88 hw threads Broadwell 
> system with kernels v4.4-21-generic (Ubuntu 16.04) and v4.20.0-rc5 
> (tip perf/core). 
> 
> The patch set is for Arnaldo's perf/core repository.
> 
> OVERHEAD:
> 			       BENCH REPORT BASED   ELAPSED TIME BASED
> 	  v4.20.0-rc5 
>           (tip perf/core):
> 				
> (current) SERIAL-SYS  / BASE : 1.27x (14.37/11.31), 1.29x (15.19/11.69)
> 	  SERIAL-NODE / BASE : 1.15x (13.04/11.31), 1.17x (13.79/11.69)
> 	  SERIAL-CPU  / BASE : 1.00x (11.32/11.31), 1.01x (11.89/11.69)
> 	
> 	  AIO1-SYS    / BASE : 1.29x (14.58/11.31), 1.29x (15.26/11.69)
> 	  AIO1-NODE   / BASE : 1.08x (12.23/11.31), 1,11x (13.01/11.69)
> 	  AIO1-CPU    / BASE : 1.07x (12.14/11.31), 1.08x (12.83/11.69)
> 
> 	  v4.4.0-21-generic
>           (Ubuntu 16.04 LTS):
> 
> (current) SERIAL-SYS  / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
> 	  SERIAL-NODE / BASE : 1.19x (13.02/10.87), 1.23x (14.03/11.32)
> 	  SERIAL-CPU  / BASE : 1.03x (11.21/10.87), 1.07x (12.18/11.32)
> 	
> 	  AIO1-SYS    / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
> 	  AIO1-NODE   / BASE : 1.10x (12.04/10.87), 1.15x (13.03/11.32)
> 	  AIO1-CPU    / BASE : 1.12x (12.20/10.87), 1.15x (13.09/11.32)
> 
> ---
> Alexey Budankov (4):
>   perf record: allocate affinity masks
>   perf record: bind the AIO user space buffers to nodes
>   perf record: apply affinity masks when reading mmap buffers
>   perf record: implement --affinity=node|cpu option
> 
>  tools/perf/Documentation/perf-record.txt |   5 ++
>  tools/perf/builtin-record.c              |  45 +++++++++-
>  tools/perf/perf.h                        |   8 ++
>  tools/perf/util/cpumap.c                 |  10 +++
>  tools/perf/util/cpumap.h                 |   1 +
>  tools/perf/util/evlist.c                 |   6 +-
>  tools/perf/util/evlist.h                 |   2 +-
>  tools/perf/util/mmap.c                   | 105 ++++++++++++++++++++++-
>  tools/perf/util/mmap.h                   |   3 +-
>  9 files changed, 175 insertions(+), 10 deletions(-)
> 
> ---
> Changes in v5:
> - avoided multiple allocations of online cpu maps by 
>   implementing it once in cpu_map__online()
> - reduced indentation at record__parse_affinity()

Reviewed-by: Jiri Olsa <jolsa@...nel.org>

thanks,
jirka