[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200914152841.GC160517@kernel.org>
Date: Mon, 14 Sep 2020 12:28:41 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Jiri Olsa <jolsa@...nel.org>, lkml <linux-kernel@...r.kernel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Ingo Molnar <mingo@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Michael Petlan <mpetlan@...hat.com>,
Song Liu <songliubraving@...com>,
"Frank Ch. Eigler" <fche@...hat.com>,
Ian Rogers <irogers@...gle.com>,
Stephane Eranian <eranian@...gle.com>,
Alexey Budankov <alexey.budankov@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Yonghong Song <yhs@...com>
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event
Em Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim escreveu:
> On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <jolsa@...nel.org> wrote:
> > Add new version of mmap event. The MMAP3 record is an
> > augmented version of MMAP2, it adds build id value to
> > identify the exact binary object behind memory map:
> > struct {
> > struct perf_event_header header;
> > u32 pid, tid;
> > u64 addr;
> > u64 len;
> > u64 pgoff;
> > u32 maj;
> > u32 min;
> > u64 ino;
> > u64 ino_generation;
> > u32 prot, flags;
> > u32 reserved;
What for this reserved? its all nicely aligned already, u64 followed by
two u32 (prot, flags).
> > u8 buildid[20];
> Do we need maj, min, ino, ino_generation for mmap3 event?
> I think they are to compare binaries, then we can do it with
> build-id (and I think it'd be better)..
Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
superset of MMAP2.
If we want to ditch useless stuff, then trow away pid, tid too, as we
can select those via sample_type.
Having said that, at this point I don't even know if adding new
PERF_RECORD_ that are an update for a preexisting one is the right way
to proceed.
Perhaps we should attach a BPF program to point where a mmap/munmap is
being done (perf_event_mmap()) and allow userspace to ask for whatever
it wants? With a kprobes there right now we can implement this MMAP3
easily, no?
Start with a kprobes and all this would be already available in kernels
with BPF, no need to reboot with a PERF_RECORD_MMAP3 enabled kernel,
when we get a tracepoint there, then use it, as its more efficient.
sample_id stuff would be done as with other records, etc, just the
things that are MMAP3 specific would be in the payload, perf.data has
the struct layout description, etc.
Then use a BPF_TRACE_ITER to generate preexisting MMAP records instead
of going thru /proc/ doing tons of syscalls, instead injecting directly
into the perf ring buffer the MMAP3 (or MMAP2 or MMAP or something else
according to the tools needs).
- Arnaldo
>
> > char filename[];
> > struct sample_id sample_id;
> > };
> >
> > Adding 4 bytes reserved field to align buildid data to 8 bytes,
> > so sample_id data is properly aligned.
> >
> > The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> > struct. When set for an event, it enables the build id retrieval
> > and will use mmap3 format for the event.
> >
> > Keeping track of mmap3 events and calling build_id_parse
> > in perf_event_mmap_event only if we have any defined.
> >
> > Having build id attached directly to the mmap event will help
> > tool like perf to skip final search through perf data for
> > binaries that are needed in the report time. Also it prevents
> > possible race when the binary could be removed or replaced
> > during profiling.
> >
> > Signed-off-by: Jiri Olsa <jolsa@...nel.org>
> > ---
> > include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++-
> > kernel/events/core.c | 38 +++++++++++++++++++++++++++------
> > 2 files changed, 57 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index 077e7ee69e3d..facfc3c673ed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > aux_output : 1, /* generate AUX records instead of events */
> > cgroup : 1, /* include cgroup events */
> > text_poke : 1, /* include text poke events */
> > - __reserved_1 : 30;
> > + mmap3 : 1, /* include bpf events */
>
> ???
>
> > + __reserved_1 : 29;
> >
> > union {
> > __u32 wakeup_events; /* wakeup every n events */
> > @@ -1060,6 +1061,30 @@ enum perf_event_type {
> > */
> > PERF_RECORD_TEXT_POKE = 20,
> >
> > + /*
> > + * The MMAP3 records are an augmented version of MMAP2, they add
> > + * build id value to identify the exact binary behind map
> > + *
> > + * struct {
> > + * struct perf_event_header header;
> > + *
> > + * u32 pid, tid;
> > + * u64 addr;
> > + * u64 len;
> > + * u64 pgoff;
> > + * u32 maj;
> > + * u32 min;
> > + * u64 ino;
> > + * u64 ino_generation;
> > + * u32 prot, flags;
> > + * u32 reserved;
> > + * u8 buildid[20];
> > + * char filename[];
> > + * struct sample_id sample_id;
> > + * };
> > + */
> > + PERF_RECORD_MMAP3 = 21,
> > +
> > PERF_RECORD_MAX, /* non-ABI */
> > };
> >
> [SNIP]
> > @@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
> > mmap_event->prot = prot;
> > mmap_event->flags = flags;
> >
> > + if (atomic_read(&nr_mmap3_events))
> > + build_id_parse(vma, mmap_event->buildid);
>
> What about if it failed? We should zero out the build-id..
>
> Thanks
> Namhyung
>
> > +
> > if (!(vma->vm_flags & VM_EXEC))
> > mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
> >
> > @@ -8241,6 +8262,7 @@ void perf_event_mmap(struct vm_area_struct *vma)
> > /* .ino_generation (attr_mmap2 only) */
> > /* .prot (attr_mmap2 only) */
> > /* .flags (attr_mmap2 only) */
> > + /* .buildid (attr_mmap3 only) */
> > };
> >
> > perf_addr_filters_adjust(vma);
> > @@ -11040,6 +11062,8 @@ static void account_event(struct perf_event *event)
> > inc = true;
> > if (event->attr.mmap || event->attr.mmap_data)
> > atomic_inc(&nr_mmap_events);
> > + if (event->attr.mmap3)
> > + atomic_inc(&nr_mmap3_events);
> > if (event->attr.comm)
> > atomic_inc(&nr_comm_events);
> > if (event->attr.namespaces)
> > --
> > 2.26.2
> >
--
- Arnaldo
Powered by blists - more mailing lists