linux-kernel - [PATCH v2 0/3] perf-stat: share hardware PMCs with BPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210316211837.910506-1-songliubraving@fb.com>
Date:   Tue, 16 Mar 2021 14:18:34 -0700
From:   Song Liu <songliubraving@...com>
To:     <linux-kernel@...r.kernel.org>
CC:     <kernel-team@...com>, <acme@...nel.org>, <acme@...hat.com>,
        <namhyung@...nel.org>, <jolsa@...nel.org>,
        Song Liu <songliubraving@...com>
Subject: [PATCH v2 0/3] perf-stat: share hardware PMCs with BPF

perf uses performance monitoring counters (PMCs) to monitor system
performance. The PMCs are limited hardware resources. For example,
Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.

Modern data center systems use these PMCs in many different ways:
system level monitoring, (maybe nested) container level monitoring, per
process monitoring, profiling (in sample mode), etc. In some cases,
there are more active perf_events than available hardware PMCs. To allow
all perf_events to have a chance to run, it is necessary to do expensive
time multiplexing of events.

On the other hand, many monitoring tools count the common metrics (cycles,
instructions). It is a waste to have multiple tools create multiple
perf_events of "cycles" and occupy multiple PMCs.

bperf tries to reduce such wastes by allowing multiple perf_events of
"cycles" or "instructions" (at different scopes) to share PMUs. Instead
of having each perf-stat session to read its own perf_events, bperf uses
BPF programs to read the perf_events and aggregate readings to BPF maps.
Then, the perf-stat session(s) reads the values from these BPF maps.

Changes v1 => v2:
  1. Add documentation.
  2. Add a shell test.
  3. Rename options, default path of the atto-map, and some variables.
  4. Add a separate patch that moves clock_gettime() in __run_perf_stat()
     to after enable_counters().
  5. Make perf_cpu_map for all cpus a global variable.
  6. Use sysfs__mountpoint() for default attr-map path.
  7. Use cpu__max_cpu() instead of libbpf_num_possible_cpus().
  8. Add flag "enabled" to the follower program. Then move follower attach
     to bperf__load() and simplify bperf__enable().

Song Liu (3):
  perf-stat: introduce bperf, share hardware PMCs with BPF
  perf-stat: measure t0 and ref_time after enable_counters()
  perf-test: add a test for perf-stat --bpf-counters option

 tools/perf/Documentation/perf-stat.txt        |  11 +
 tools/perf/Makefile.perf                      |   1 +
 tools/perf/builtin-stat.c                     |  20 +-
 tools/perf/tests/shell/stat_bpf_counters.sh   |  34 ++
 tools/perf/util/bpf_counter.c                 | 519 +++++++++++++++++-
 tools/perf/util/bpf_skel/bperf.h              |  14 +
 tools/perf/util/bpf_skel/bperf_follower.bpf.c |  69 +++
 tools/perf/util/bpf_skel/bperf_leader.bpf.c   |  46 ++
 tools/perf/util/bpf_skel/bperf_u.h            |  14 +
 tools/perf/util/evsel.h                       |  20 +-
 tools/perf/util/target.h                      |   4 +-
 11 files changed, 742 insertions(+), 10 deletions(-)
 create mode 100755 tools/perf/tests/shell/stat_bpf_counters.sh
 create mode 100644 tools/perf/util/bpf_skel/bperf.h
 create mode 100644 tools/perf/util/bpf_skel/bperf_follower.bpf.c
 create mode 100644 tools/perf/util/bpf_skel/bperf_leader.bpf.c
 create mode 100644 tools/perf/util/bpf_skel/bperf_u.h

--
2.30.2