lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B3CF1B3-5EED-4882-BC99-AD676D4E3429@fb.com>
Date:   Fri, 12 Mar 2021 15:38:43 +0000
From:   Song Liu <songliubraving@...com>
To:     Namhyung Kim <namhyung@...nel.org>
CC:     linux-kernel <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        "Arnaldo Carvalho de Melo" <acme@...hat.com>,
        Jiri Olsa <jolsa@...nel.org>
Subject: Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF



> On Mar 12, 2021, at 12:36 AM, Namhyung Kim <namhyung@...nel.org> wrote:
> 
> Hi,
> 
> On Fri, Mar 12, 2021 at 11:03 AM Song Liu <songliubraving@...com> wrote:
>> 
>> perf uses performance monitoring counters (PMCs) to monitor system
>> performance. The PMCs are limited hardware resources. For example,
>> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
>> 
>> Modern data center systems use these PMCs in many different ways:
>> system level monitoring, (maybe nested) container level monitoring, per
>> process monitoring, profiling (in sample mode), etc. In some cases,
>> there are more active perf_events than available hardware PMCs. To allow
>> all perf_events to have a chance to run, it is necessary to do expensive
>> time multiplexing of events.
>> 
>> On the other hand, many monitoring tools count the common metrics (cycles,
>> instructions). It is a waste to have multiple tools create multiple
>> perf_events of "cycles" and occupy multiple PMCs.
>> 
>> bperf tries to reduce such wastes by allowing multiple perf_events of
>> "cycles" or "instructions" (at different scopes) to share PMUs. Instead
>> of having each perf-stat session to read its own perf_events, bperf uses
>> BPF programs to read the perf_events and aggregate readings to BPF maps.
>> Then, the perf-stat session(s) reads the values from these BPF maps.
>> 
>> Please refer to the comment before the definition of bperf_ops for the
>> description of bperf architecture.
> 
> Interesting!  Actually I thought about something similar before,
> but my BPF knowledge is outdated.  So I need to catch up but
> failed to have some time for it so far. ;-)
> 
>> 
>> bperf is off by default. To enable it, pass --use-bpf option to perf-stat.
>> bperf uses a BPF hashmap to share information about BPF programs and maps
>> used by bperf. This map is pinned to bpffs. The default address is
>> /sys/fs/bpf/bperf_attr_map. The user could change the address with option
>> --attr-map.
>> 
>> ---
>> Known limitations:
>> 1. Do not support per cgroup events;
>> 2. Do not support monitoring of BPF program (perf-stat -b);
>> 3. Do not support event groups.
> 
> In my case, per cgroup event counting is very important.
> And I'd like to do that with lots of cpus and cgroups.

We can easily extend this approach to support cgroups events. I didn't 
implement it to keep the first version simple. 

> So I'm working on an in-kernel solution (without BPF),
> I hope to share it soon.

This is interesting! I cannot wait to see how it looks like. I spent
quite some time try to enable in kernel sharing (not just cgroup
events), but finally decided to try BPF approach. 

> 
> And for event groups, it seems the current implementation
> cannot handle more than one event (not even in a group).
> That could be a serious limitation..

It supports multiple events. Multiple events are independent, i.e.,
"cycles" and "instructions" would use two independent leader programs.

> 
>> 
>> The following commands have been tested:
>> 
>>   perf stat --use-bpf -e cycles -a
>>   perf stat --use-bpf -e cycles -C 1,3,4
>>   perf stat --use-bpf -e cycles -p 123
>>   perf stat --use-bpf -e cycles -t 100,101
> 
> Hmm... so it loads both leader and follower programs if needed, right?
> Does it support multiple followers with different targets at the same time?

Yes, the whole idea is to have one leader program and multiple follower
programs. If we only run one of these commands at a time, it will load 
one leader and one follower. If we run multiple of them in parallel, 
they will share the same leader program and load multiple follower 
programs. 

I actually tested more than the commands above. The list actually means
we support -a, -C -p, and -t. 

Currently, this works for multiple events, and different parallel 
perf-stat. The two commands below will work well in parallel:
  
  perf stat --use-bpf -e ref-cycles,instructions -a
  perf stat --use-bpf -e ref-cycles,cycles -C 1,3,5

Note the use of ref-cycles, which can only use one counter on Intel CPUs.
With this approach, the above two commands will not do time multiplexing
on ref-cycles. 

Thanks,
Song

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ