[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YE9qf8vcS8svnFG7@krava>
Date: Mon, 15 Mar 2021 15:09:03 +0100
From: Jiri Olsa <jolsa@...hat.com>
To: Song Liu <songliubraving@...com>
Cc: linux-kernel <linux-kernel@...r.kernel.org>,
Kernel Team <Kernel-team@...com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
"acme@...hat.com" <acme@...hat.com>,
"namhyung@...nel.org" <namhyung@...nel.org>,
"jolsa@...nel.org" <jolsa@...nel.org>
Subject: Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF
On Mon, Mar 15, 2021 at 07:51:11AM +0000, Song Liu wrote:
SNIP
> >> Known limitations:
> >> 1. Do not support per cgroup events;
> >
> > isn't that another filter via bpf_get_current_cgroup_id ?
>
> This is tricky with nested cgroups. We also have bpf_get_current_ancestor_cgroup_id,
> but that's not the exact same we need either. We may need a new helper.
>
> Also, we are limited by 38 follower programs for the leader program, which
> might be too few for Namhyung's use case. We can use some logic in the
> follower program to count events for many cgroups in one fexit program.
ok, I see
>
> >
> >> 2. Do not support monitoring of BPF program (perf-stat -b);
> >
> > we'd need to call the leadr on fentry/fexit of the monitored bpf
> > program, right?
>
> My current plan is to let perf-stat -b share the same perf_event map with
> bperf, but not the leader program.
ok
>
> >
> >> 3. Do not support event groups.
> >
> > I guess for group support you'll need to load 'leaders' for each group member
>
> I am not sure how this would work. Say the user started the following in
> parallel:
>
> #1 perf stat --bpf-counters -e cycles -a &
> #2 perf stat --bpf-counters -e instructions -C 1,2,3 &
> #3 perf stat --bpf-counters -e {cycles,instructions} -p 123
>
> Event "cycles" is loaded by #1; while event "instruction" is loaded by #2.
> If #3 reuses these events, it is tricky (or impossible?) to make sure the
> event group works as expected, right?
the reason for group is to force kernel to all events if possible
or none.. so with your change I don't see a problem, you just
provide values for those counters.. but let's see when you get
there ;-)
>
> >
> >>
> >> The following commands have been tested:
> >>
> >> perf stat --use-bpf -e cycles -a
> >> perf stat --use-bpf -e cycles -C 1,3,4
> >> perf stat --use-bpf -e cycles -p 123
> >> perf stat --use-bpf -e cycles -t 100,101
> >
> > works good with that file you sent.. I'll check/test more,
> > so far some quick comments below
> >
> > thanks,
> > jirka
> >
> >
> >
> > SNIP
> >
> >> @@ -1146,6 +1156,10 @@ static struct option stat_options[] = {
> >> #ifdef HAVE_BPF_SKEL
> >> OPT_STRING('b', "bpf-prog", &target.bpf_str, "bpf-prog-id",
> >> "stat events on existing bpf program id"),
> >> + OPT_BOOLEAN(0, "use-bpf", &target.use_bpf,
> >> + "use bpf program to count events"),
> >> + OPT_STRING(0, "attr-map", &target.attr_map, "attr-map-path",
> >> + "path to perf_event_attr map"),
> >
> > what's the point of allowing another name? just debug purpose?
>
> It is mostly to cover corner cases, like something else used the same
> name.
about that.. we just take the object fd assuming it's map,
should we test it somehow?
map_fd = bpf_obj_get(path);
if it's not the map we expect, I think we should generate
another name without forcing user to run again with --attr-map
but still warn, so other perf session can use the new name
SNIP
> >> +static int bperf_sync_counters(struct evsel *evsel)
> >> +{
> >> + struct perf_cpu_map *all_cpus = perf_cpu_map__new(NULL);
> >> + int num_cpu, i, cpu;
> >> +
> >> + if (!all_cpus)
> >> + return -1;
> >> +
> >> + num_cpu = all_cpus->nr;
> >> + for (i = 0; i < num_cpu; i++) {
> >> + cpu = all_cpus->map[i];
> >> + bperf_trigger_reading(evsel->bperf_leader_prog_fd, cpu);
> >> + }
> >> + return 0;
> >> +}
> >> +
> >> +static int bperf__enable(struct evsel *evsel)
> >> +{
> >> + struct bperf_follower_bpf *skel = evsel->follower_skel;
> >> + __u32 num_cpu_bpf = libbpf_num_possible_cpus();
> >
> > we have cpu__max_cpu for that
>
> libbpf calls for percpu array use libbpf_num_possible_cpus. So I guess it
> is better to use the same here. The two are identical at the moment though.
then in the bperf__read you take that array and update
perf_counts, which is based on perf's cpus, so they mix
anyway
I'd like to keep perf code using perf's cpus api.. could
we just check at the begining that libbpf_num_possible_cpus
returns same number as cpu__max_cpu (if not, we have a
problem anyway) and use perf's cpu api
thanks,
jirka
Powered by blists - more mailing lists