[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUansxK_as61teHsLyRQu3Zu5UE-+SDqWVYGhSz36uCzQ@mail.gmail.com>
Date: Mon, 14 Sep 2020 10:08:01 -0700
From: Ian Rogers <irogers@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Jiri Olsa <jolsa@...nel.org>,
lkml <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Michael Petlan <mpetlan@...hat.com>,
Song Liu <songliubraving@...com>,
"Frank Ch. Eigler" <fche@...hat.com>,
Stephane Eranian <eranian@...gle.com>,
Alexey Budankov <alexey.budankov@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Yonghong Song <yhs@...com>
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event
On Mon, Sep 14, 2020 at 9:35 AM <peterz@...radead.org> wrote:
>
> On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
>
> > > > struct {
> > > > struct perf_event_header header;
> >
> > > > u32 pid, tid;
> > > > u64 addr;
> > > > u64 len;
> > > > u64 pgoff;
> > > > u32 maj;
> > > > u32 min;
> > > > u64 ino;
> > > > u64 ino_generation;
> > > > u32 prot, flags;
> > > > u32 reserved;
> >
> > What for this reserved? its all nicely aligned already, u64 followed by
> > two u32 (prot, flags).
>
> I suspect it is so that sizeof(reserve+buildid) is a multiple of 8. But
> yes, that's a wee bit daft, since the next field is a variable length
> character array.
>
> > > > u8 buildid[20];
> >
> > > Do we need maj, min, ino, ino_generation for mmap3 event?
> > > I think they are to compare binaries, then we can do it with
> > > build-id (and I think it'd be better)..
> >
> > Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> > superset of MMAP2.
>
> Well, the 'funny' thing is that if you do use buildid, then
> {maj,min,ino,ino_generation} are indeed superfluous, but are combined
> also large enough to contain buildid.
>
> > If we want to ditch useless stuff, then trow away pid, tid too, as we
> > can select those via sample_type.
>
> Correct.
>
> So something like:
>
> struct {
> struct perf_event_header header;
>
> u64 addr;
> u64 len;
> u64 pgoff;
> union {
> struct {
> u32 maj;
> u32 min;
> u64 ino;
> u64 ino_generation;
> };
> u8 buildid[20];
> };
> u32 prot, flags;
> char filename[];
> struct sample_id sample_id;
> };
>
> Using one of the MISC bits to resolve the union. Might actually bring
> benefit to everyone. Us normal people get to have a smaller MMAP record,
> while the buildid folks can have it too.
>
> Even more extreme would be using 2 MISC bits and allowing the union to
> be 0 sized for anon.
>
> That said; I have the nagging feeling there were unresolved issues with
> mmap2, but I can't seem to find any relevant emails on it :/ My
> google-fu is weak today.
Firstly, thanks Jiri for this really useful patch set for our
use-cases! Some thoughts:
One issue with mmap2 events at the moment is that they happen "after"
the mmap is performed. This allows the mapped address to be known but
has the unfortunate problem of causing mmap events to get "extended"
due to adjacent vmas being merged. There are some workarounds like
commit c8f6ae1fb28d (perf inject jit: Remove //anon mmap events).
Perhaps these events can switch to reporting the length the mmap
requested rather than the length of the vma?
I could imagine that changes here could be interesting to folks doing
CHERI or hypervisors, for example, they may want more address bits.
BPF has stack traces with elements of buildid + offset. Using these in
perf samples would avoid the need for mmap events, synthesis, etc. but
at the cost of larger and possibly slower stack traces. Perhaps we
should just remove the idea of mmap events altogether?
Thanks,
Ian
Powered by blists - more mailing lists