[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZvbnePGVmbWF0fAF@google.com>
Date: Fri, 27 Sep 2024 10:12:24 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Tengda Wu <wutengda@...weicloud.com>
Cc: Peter Zijlstra <peterz@...radead.org>, song@...nel.org,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>, kan.liang@...ux.intel.com,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: [PATCH -next 1/2] perf stat: Increase perf_attr_map entries
On Fri, Sep 27, 2024 at 10:35:54AM +0800, Tengda Wu wrote:
>
>
> On 2024/9/26 12:16, Namhyung Kim wrote:
> > On Wed, Sep 25, 2024 at 01:55:22PM +0000, Tengda Wu wrote:
> >> bperf restricts the size of perf_attr_map's entries to 16, which
> >> cannot hold all events in many scenarios. A typical example is
> >> when the user specifies `-a -ddd` ([0]). And in other cases such as
> >> top-down analysis, which often requires a set of more than 16 PMUs
> >> to be collected simultaneously.
> >>
> >> Fix this by increase perf_attr_map entries to 100, and an event
> >> number check has been introduced when bperf__load() to ensure that
> >> users receive a more friendly prompt when the event limit is reached.
> >>
> >> [0] https://lore.kernel.org/all/20230104064402.1551516-3-namhyung@kernel.org/
> >
> > Apparently this patch was never applied. I don't know how much you need
> > but having too many events at the same time won't be very useful because
> > multiplexing could reduce the accuracy.
> >
>
> Could you please explain why patch [0] was not merged at that time? I couldn't
> find this information from the previous emails.
I guess it's just fell through the crack. :)
>
> In my scenario, we collect more than 40+ events to support necessary metric
> calculations, which multiplexing is inevitable. Although multiplexing may
> reduce accuracy, for the purpose of supporting metric calculations, these
> accuracy losses can be acceptable. Perf also has the same issue with multiplexing.
> Removing the event limit for bperf can provide users with additional options.
>
> In addition to accuracy, we also care about overhead. I compared the overhead
> of bperf and perf by testing ./lat_ctx in lmbench [1], and found that the
> overhead of bperf stat is about 4% less than perf. This is why we choose to
> use bperf in some extreme scenarios.
Ok, thanks for explanation. I think it's ok to increase the limit.
Thanks,
Namhyung
>
> [1] https://github.com/intel/lmbench
>
> Thanks,
> Tengda
>
> >
> >>
> >> Fixes: 7fac83aaf2ee ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF")
> >> Signed-off-by: Tengda Wu <wutengda@...weicloud.com>
> >> ---
> >> tools/perf/util/bpf_counter.c | 8 +++++++-
> >> 1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> >> index 7a8af60e0f51..3346129c20cf 100644
> >> --- a/tools/perf/util/bpf_counter.c
> >> +++ b/tools/perf/util/bpf_counter.c
> >> @@ -28,7 +28,7 @@
> >> #include "bpf_skel/bperf_leader.skel.h"
> >> #include "bpf_skel/bperf_follower.skel.h"
> >>
> >> -#define ATTR_MAP_SIZE 16
> >> +#define ATTR_MAP_SIZE 100
> >>
> >> static inline void *u64_to_ptr(__u64 ptr)
> >> {
> >> @@ -451,6 +451,12 @@ static int bperf__load(struct evsel *evsel, struct target *target)
> >> enum bperf_filter_type filter_type;
> >> __u32 filter_entry_cnt, i;
> >>
> >> + if (evsel->evlist->core.nr_entries > ATTR_MAP_SIZE) {
> >> + pr_err("Too many events, please limit to %d or less\n",
> >> + ATTR_MAP_SIZE);
> >> + return -1;
> >> + }
> >> +
> >> if (bperf_check_target(evsel, target, &filter_type, &filter_entry_cnt))
> >> return -1;
> >>
> >> --
> >> 2.34.1
> >>
>
Powered by blists - more mailing lists