lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 12 Mar 2021 17:36:08 +0900
From:   Namhyung Kim <namhyung@...nel.org>
To:     Song Liu <songliubraving@...com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>, kernel-team@...com,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        Jiri Olsa <jolsa@...nel.org>
Subject: Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF

Hi,

On Fri, Mar 12, 2021 at 11:03 AM Song Liu <songliubraving@...com> wrote:
>
> perf uses performance monitoring counters (PMCs) to monitor system
> performance. The PMCs are limited hardware resources. For example,
> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
>
> Modern data center systems use these PMCs in many different ways:
> system level monitoring, (maybe nested) container level monitoring, per
> process monitoring, profiling (in sample mode), etc. In some cases,
> there are more active perf_events than available hardware PMCs. To allow
> all perf_events to have a chance to run, it is necessary to do expensive
> time multiplexing of events.
>
> On the other hand, many monitoring tools count the common metrics (cycles,
> instructions). It is a waste to have multiple tools create multiple
> perf_events of "cycles" and occupy multiple PMCs.
>
> bperf tries to reduce such wastes by allowing multiple perf_events of
> "cycles" or "instructions" (at different scopes) to share PMUs. Instead
> of having each perf-stat session to read its own perf_events, bperf uses
> BPF programs to read the perf_events and aggregate readings to BPF maps.
> Then, the perf-stat session(s) reads the values from these BPF maps.
>
> Please refer to the comment before the definition of bperf_ops for the
> description of bperf architecture.

Interesting!  Actually I thought about something similar before,
but my BPF knowledge is outdated.  So I need to catch up but
failed to have some time for it so far. ;-)

>
> bperf is off by default. To enable it, pass --use-bpf option to perf-stat.
> bperf uses a BPF hashmap to share information about BPF programs and maps
> used by bperf. This map is pinned to bpffs. The default address is
> /sys/fs/bpf/bperf_attr_map. The user could change the address with option
> --attr-map.
>
> ---
> Known limitations:
> 1. Do not support per cgroup events;
> 2. Do not support monitoring of BPF program (perf-stat -b);
> 3. Do not support event groups.

In my case, per cgroup event counting is very important.
And I'd like to do that with lots of cpus and cgroups.
So I'm working on an in-kernel solution (without BPF),
I hope to share it soon.

And for event groups, it seems the current implementation
cannot handle more than one event (not even in a group).
That could be a serious limitation..

>
> The following commands have been tested:
>
>    perf stat --use-bpf -e cycles -a
>    perf stat --use-bpf -e cycles -C 1,3,4
>    perf stat --use-bpf -e cycles -p 123
>    perf stat --use-bpf -e cycles -t 100,101

Hmm... so it loads both leader and follower programs if needed, right?
Does it support multiple followers with different targets at the same time?

Thanks,
Namhyung

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ