[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190128112748.GC15461@krava>
Date: Mon, 28 Jan 2019 12:27:48 +0100
From: Jiri Olsa <jolsa@...hat.com>
To: Alexey Budankov <alexey.budankov@...ux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Namhyung Kim <namhyung@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 0/4] Reduce NUMA related overhead in perf record
profiling on large server systems
On Tue, Jan 22, 2019 at 08:45:12PM +0300, Alexey Budankov wrote:
SNIP
> The patch set has been validated on BT benchmark from NAS Parallel
> Benchmarks [2] running on dual socket, 44 cores, 88 hw threads Broadwell
> system with kernels v4.4-21-generic (Ubuntu 16.04) and v4.20.0-rc5
> (tip perf/core).
>
> The patch set is for Arnaldo's perf/core repository.
>
> OVERHEAD:
> BENCH REPORT BASED ELAPSED TIME BASED
> v4.20.0-rc5
> (tip perf/core):
>
> (current) SERIAL-SYS / BASE : 1.27x (14.37/11.31), 1.29x (15.19/11.69)
> SERIAL-NODE / BASE : 1.15x (13.04/11.31), 1.17x (13.79/11.69)
> SERIAL-CPU / BASE : 1.00x (11.32/11.31), 1.01x (11.89/11.69)
>
> AIO1-SYS / BASE : 1.29x (14.58/11.31), 1.29x (15.26/11.69)
> AIO1-NODE / BASE : 1.08x (12.23/11.31), 1,11x (13.01/11.69)
> AIO1-CPU / BASE : 1.07x (12.14/11.31), 1.08x (12.83/11.69)
>
> v4.4.0-21-generic
> (Ubuntu 16.04 LTS):
>
> (current) SERIAL-SYS / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
> SERIAL-NODE / BASE : 1.19x (13.02/10.87), 1.23x (14.03/11.32)
> SERIAL-CPU / BASE : 1.03x (11.21/10.87), 1.07x (12.18/11.32)
>
> AIO1-SYS / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
> AIO1-NODE / BASE : 1.10x (12.04/10.87), 1.15x (13.03/11.32)
> AIO1-CPU / BASE : 1.12x (12.20/10.87), 1.15x (13.09/11.32)
>
> ---
> Alexey Budankov (4):
> perf record: allocate affinity masks
> perf record: bind the AIO user space buffers to nodes
> perf record: apply affinity masks when reading mmap buffers
> perf record: implement --affinity=node|cpu option
>
> tools/perf/Documentation/perf-record.txt | 5 ++
> tools/perf/builtin-record.c | 45 +++++++++-
> tools/perf/perf.h | 8 ++
> tools/perf/util/cpumap.c | 10 +++
> tools/perf/util/cpumap.h | 1 +
> tools/perf/util/evlist.c | 6 +-
> tools/perf/util/evlist.h | 2 +-
> tools/perf/util/mmap.c | 105 ++++++++++++++++++++++-
> tools/perf/util/mmap.h | 3 +-
> 9 files changed, 175 insertions(+), 10 deletions(-)
>
> ---
> Changes in v5:
> - avoided multiple allocations of online cpu maps by
> implementing it once in cpu_map__online()
> - reduced indentation at record__parse_affinity()
Reviewed-by: Jiri Olsa <jolsa@...nel.org>
thanks,
jirka
Powered by blists - more mailing lists