linux-kernel - Re: [PATCH v2 0/5] Benchmark and improve event synthesis performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200403110137.GK2784502@krava>
Date:   Fri, 3 Apr 2020 13:01:37 +0200
From:   Jiri Olsa <jolsa@...hat.com>
To:     Ian Rogers <irogers@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Petr Mladek <pmladek@...e.com>,
        Andrey Zhizhikin <andrey.z@...il.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Kan Liang <kan.liang@...ux.intel.com>,
        linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH v2 0/5] Benchmark and improve event synthesis performance

On Thu, Apr 02, 2020 at 08:43:52AM -0700, Ian Rogers wrote:
> Event synthesis is performance critical in common tasks using perf. For
> example, when perf record starts in system wide mode the /proc file
> system is scanned with events synthesized for each process and all
> executable mmaps. With large machines and lots of processes, we have seen
> O(seconds) of wall clock time while synthesis is occurring.
> 
> This patch set adds a benchmark for synthesis performance in a new
> benchmark collection called 'internals'. The benchmark uses the
> machine__synthesize_threads function, single threaded on the perf process
> with a 'tool' that just drops the events, to measure how long synthesis
> takes.
> 
> By profiling this benchmark 2 performance bottlenecks were identified,
> hugetlbfs_mountpoint and stdio. The impact of theses changes are:
> 
> Before:
> Average synthesis took: 167.616800 usec
> Average data synthesis took: 208.655600 usec
> 
> After hugetlbfs_mountpoint scalability fix:
> Average synthesis took: 120.195100 usec
> Average data synthesis took: 156.582300 usec
> 
> After removal of stdio in /proc/pid/maps code:
> Average synthesis took: 67.189100 usec
> Average data synthesis took: 102.451600 usec
> 
> Time was measured on an Intel Xeon 6154 compiling with Debian gcc 9.2.1.
> 
> v2 of this patch set adds the new benchmark to the perf-bench man page
> and addresses review comments from Jiri Olsa, thanks!

Acked-by: Jiri Olsa <jolsa@...hat.com>

thanks,
jirka

> 
> Two patches in the set were sent to LKML previously but are included
> here for context around the benchmark performance impact:
> https://lore.kernel.org/lkml/20200327172914.28603-1-irogers@google.com/T/#u
> https://lore.kernel.org/lkml/20200328014221.168130-1-irogers@google.com/T/#u
> 
> A future area of improvement could be to add the perf top
> num-thread-synthesize option more widely to other perf commands, and
> also to benchmark its effectiveness.
> 
> Ian Rogers (4):
>   perf bench: add event synthesis benchmark
>   perf synthetic-events: save 4kb from 2 stack frames
>   tools api: add a lightweight buffered reading api
>   perf synthetic events: Remove use of sscanf from /proc reading
> 
> Stephane Eranian (1):
>   tools api fs: make xxx__mountpoint() more scalable
> 
>  tools/lib/api/fs/fs.c                   |  17 +++
>  tools/lib/api/fs/fs.h                   |  12 ++
>  tools/lib/api/io.h                      | 107 ++++++++++++++
>  tools/perf/Documentation/perf-bench.txt |   8 ++
>  tools/perf/bench/Build                  |   2 +-
>  tools/perf/bench/bench.h                |   2 +-
>  tools/perf/bench/synthesize.c           | 101 ++++++++++++++
>  tools/perf/builtin-bench.c              |   6 +
>  tools/perf/util/synthetic-events.c      | 177 +++++++++++++++---------
>  9 files changed, 367 insertions(+), 65 deletions(-)
>  create mode 100644 tools/lib/api/io.h
>  create mode 100644 tools/perf/bench/synthesize.c
> 
> -- 
> 2.26.0.rc2.310.g2932bb562d-goog
>