[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM9d7ci9MbW_HbGbYge2iqiiEr8VPk32-5cBMViwCFsV+EmfgQ@mail.gmail.com>
Date: Tue, 27 Dec 2022 15:19:24 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Jiri Olsa <olsajiri@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Song Liu <songliubraving@...com>,
Martin KaFai Lau <kafai@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Hao Luo <haoluo@...gle.com>,
Stanislav Fomichev <sdf@...gle.com>,
LKML <linux-kernel@...r.kernel.org>, bpf@...r.kernel.org,
Ingo Molnar <mingo@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: Re: [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run()
On Thu, Dec 22, 2022 at 11:53 PM Jiri Olsa <olsajiri@...il.com> wrote:
>
> On Thu, Dec 22, 2022 at 02:25:49PM -0800, Namhyung Kim wrote:
> > On Thu, Dec 22, 2022 at 12:16 PM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Thu, Dec 22, 2022 at 09:34:42AM -0800, Namhyung Kim wrote:
> > >
> > > > Sorry about that. Let me rephrase it like below:
> > > >
> > > > With bpf_cast_to_kern_ctx(), BPF programs attached to a perf event
> > > > can access perf sample data directly from the ctx.
> > >
> > > This is the bpf_prog_run() in bpf_overflow_handler(), right?
> >
> > Yes.
> >
> > >
> > > > But the perf sample
> > > > data is not fully prepared at this point, and some fields can have invalid
> > > > uninitialized values. So it needs to call perf_prepare_sample() before
> > > > calling the BPF overflow handler.
> > >
> > > It never was, why is it a problem now?
> >
> > BPF used to allow selected fields only like period and addr, and they
> > are initialized always by perf_sample_data_init(). This is relaxed
> > by the bpf_cast_to_kern_ctx() and it can easily access arbitrary
> > fields of perf_sample_data now.
> >
> > The background of this change is to use BPF as a filter for perf
> > event samples. The code is there already and returning 0 from
> > BPF can drop perf samples. With access to more sample data,
> > it'd make more educated decisions.
> >
> > For example, I got some requests to limit perf samples in a
> > selected region of address (code or data). Or it can collect
> > samples only if some hardware specific information is set in
> > the raw data like in AMD IBS. We can easily extend it to other
> > sample info based on users' needs.
> >
> > >
> > > > But just calling perf_prepare_sample() can be costly when the BPF
> > >
> > > So you potentially call it twice now, how's that useful?
> >
> > Right. I think we can check data->sample_flags in
> > perf_prepare_sample() to minimize the duplicate work.
> > It already does it for some fields, but misses others.
>
> we used to have __PERF_SAMPLE_CALLCHAIN_EARLY to avoid extra perf_callchain,
> could we add some flag like __PERF_SAMPLE_INIT_EARLY to avoid double call to
> perf_prepare_sample?
I think we can check if the filtered_sample_type is 0.
But it still needs to update the perf_event_header.
I think we need to save the calculated size separately.
Thanks,
Namhyung
Powered by blists - more mailing lists