[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bzb8E7wzwBn+cx-XAW0ofEqemeuZoawHTFoTc-jK1azasA@mail.gmail.com>
Date: Mon, 6 May 2024 11:51:34 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Greg KH <gregkh@...uxfoundation.org>, Andrii Nakryiko <andrii@...nel.org>,
linux-fsdevel@...r.kernel.org, brauner@...nel.org, viro@...iv.linux.org.uk,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
linux-mm@...ck.org, Daniel Müller <deso@...teo.net>,
"linux-perf-use." <linux-perf-users@...r.kernel.org>
Subject: Re: [PATCH 2/5] fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps
On Mon, May 6, 2024 at 11:05 AM Namhyung Kim <namhyung@...nel.org> wrote:
>
> Hello,
>
> On Mon, May 6, 2024 at 6:58 AM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
> >
> > On Sat, May 04, 2024 at 02:50:31PM -0700, Andrii Nakryiko wrote:
> > > On Sat, May 4, 2024 at 8:28 AM Greg KH <gregkh@...uxfoundation.org> wrote:
> > > > On Fri, May 03, 2024 at 05:30:03PM -0700, Andrii Nakryiko wrote:
> > > > > Note also, that fetching VMA name (e.g., backing file path, or special
> > > > > hard-coded or user-provided names) is optional just like build ID If
> > > > > user sets vma_name_size to zero, kernel code won't attempt to retrieve
> > > > > it, saving resources.
> >
> > > > > Signed-off-by: Andrii Nakryiko <andrii@...nel.org>
> >
> > > > Where is the userspace code that uses this new api you have created?
> >
> > > So I added a faithful comparison of existing /proc/<pid>/maps vs new
> > > ioctl() API to solve a common problem (as described above) in patch
> > > #5. The plan is to put it in mentioned blazesym library at the very
> > > least.
> > >
> > > I'm sure perf would benefit from this as well (cc'ed Arnaldo and
> > > linux-perf-user), as they need to do stack symbolization as well.
>
> I think the general use case in perf is different. This ioctl API is great
> for live tracing of a single (or a small number of) process(es). And
> yes, perf tools have those tracing use cases too. But I think the
> major use case of perf tools is system-wide profiling.
The intended use case is also a system-wide profiling, but I haven't
heard that opening a file per process is a big bottleneck or a
limitation, tbh.
>
> For system-wide profiling, you need to process samples of many
> different processes at a high frequency. Now perf record doesn't
> process them and just save it for offline processing (well, it does
> at the end to find out build-ID but it can be omitted).
>
> Doing it online is possible (like perf top) but it would add more
> overhead during the profiling. And we cannot move processing
> or symbolization to the end of profiling because some (short-
> lived) tasks can go away.
We do have some setups where we install a BPF program that monitors
process exit and mmap() events and emits (proactively) VMA
information. It's not applicable everywhere, and in some setups (like
Oculus case) we just accept that short-lived processes will be missed
at the expense of less interruption, simpler and less privileged
"agents" doing profiling and address resolution logic.
So the problem space, as can be seen, is pretty vast and varied, and
there is no single API that would serve all the needs perfectly.
>
> Also it should support perf report (offline) on data from a
> different kernel or even a different machine.
We fetch build ID (and resolve file offset) and offload actual
symbolization to a dedicated fleet of servers, whenever possible. We
don't yet do it for kernel stack traces, but we are moving in this
direction (and there are their own problems with /proc/kallsyms being
text-based, listing everything, and pretty big all in itself; but
that's a separate topic).
>
> So it saves the memory map of processes and symbolizes
> the stack trace with it later. Of course it needs to be updated
> as the memory map changes and that's why it tracks mmap
> or similar syscalls with PERF_RECORD_MMAP[2] records.
>
> A problem with this approach is to get the initial state of all
> (or a target for non-system-wide mode) existing processes.
> We call it synthesizing, and read /proc/PID/maps to generate
> the mmap records.
>
> I think the below comment from Arnaldo talked about how
> we can improve the synthesizing (which is sequential access
> to proc maps) using BPF.
Yep. We can also benchmark using this new ioctl() to fetch a full set
of VMAs, it might still be good enough.
>
> Thanks,
> Namhyung
>
[...]
Powered by blists - more mailing lists