[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45BCAC33-1626-42D1-A170-92DC8D7BAAF8@fb.com>
Date: Fri, 12 Mar 2021 15:45:13 +0000
From: Song Liu <songliubraving@...com>
To: Jiri Olsa <jolsa@...hat.com>
CC: linux-kernel <linux-kernel@...r.kernel.org>,
Kernel Team <Kernel-team@...com>,
"acme@...nel.org" <acme@...nel.org>,
"acme@...hat.com" <acme@...hat.com>,
"namhyung@...nel.org" <namhyung@...nel.org>,
"jolsa@...nel.org" <jolsa@...nel.org>
Subject: Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF
> On Mar 12, 2021, at 4:12 AM, Jiri Olsa <jolsa@...hat.com> wrote:
>
> On Thu, Mar 11, 2021 at 06:02:57PM -0800, Song Liu wrote:
>> perf uses performance monitoring counters (PMCs) to monitor system
>> performance. The PMCs are limited hardware resources. For example,
>> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
>>
>> Modern data center systems use these PMCs in many different ways:
>> system level monitoring, (maybe nested) container level monitoring, per
>> process monitoring, profiling (in sample mode), etc. In some cases,
>> there are more active perf_events than available hardware PMCs. To allow
>> all perf_events to have a chance to run, it is necessary to do expensive
>> time multiplexing of events.
>>
>> On the other hand, many monitoring tools count the common metrics (cycles,
>> instructions). It is a waste to have multiple tools create multiple
>> perf_events of "cycles" and occupy multiple PMCs.
>>
>> bperf tries to reduce such wastes by allowing multiple perf_events of
>> "cycles" or "instructions" (at different scopes) to share PMUs. Instead
>> of having each perf-stat session to read its own perf_events, bperf uses
>> BPF programs to read the perf_events and aggregate readings to BPF maps.
>> Then, the perf-stat session(s) reads the values from these BPF maps.
>>
>> Please refer to the comment before the definition of bperf_ops for the
>> description of bperf architecture.
>>
>> bperf is off by default. To enable it, pass --use-bpf option to perf-stat.
>> bperf uses a BPF hashmap to share information about BPF programs and maps
>> used by bperf. This map is pinned to bpffs. The default address is
>> /sys/fs/bpf/bperf_attr_map. The user could change the address with option
>> --attr-map.
>
> nice, I recall the presentation about that and was wondering
> when this will come up ;-)
The progress is slower than I expected. But I finished some dependencies of
this in the last year:
1. BPF_PROG_TEST_RUN for raw_tp event;
2. perf-stat -b, which introduced skeleton and bpf_counter;
3. BPF task local storage, I didn't use it in this version, but it could,
help optimize bperf in the future.
>
>>
>> ---
>> Known limitations:
>> 1. Do not support per cgroup events;
>> 2. Do not support monitoring of BPF program (perf-stat -b);
>> 3. Do not support event groups.
>>
>> The following commands have been tested:
>>
>> perf stat --use-bpf -e cycles -a
>> perf stat --use-bpf -e cycles -C 1,3,4
>> perf stat --use-bpf -e cycles -p 123
>> perf stat --use-bpf -e cycles -t 100,101
>
> I assume the output is same as standard perf?
Yes, the output is identical to that without --use-bpf option.
Thanks,
Song
Powered by blists - more mailing lists