lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzYAQwX0AQ_fbcB9kVBj3vpx0-5pPPZNYKL4VjnX_eYKpg@mail.gmail.com>
Date: Mon, 10 Jun 2024 09:17:43 +0100
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Andrei Vagin <avagin@...il.com>
Cc: Andrii Nakryiko <andrii@...nel.org>, linux-fsdevel@...r.kernel.org, brauner@...nel.org, 
	viro@...iv.linux.org.uk, akpm@...ux-foundation.org, 
	linux-kernel@...r.kernel.org, bpf@...r.kernel.org, gregkh@...uxfoundation.org, 
	linux-mm@...ck.org, liam.howlett@...cle.com, surenb@...gle.com, 
	rppt@...nel.org
Subject: Re: [PATCH v3 3/9] fs/procfs: implement efficient VMA querying API
 for /proc/<pid>/maps

On Fri, Jun 7, 2024 at 11:31 PM Andrei Vagin <avagin@...il.com> wrote:
>
> On Tue, Jun 04, 2024 at 05:24:48PM -0700, Andrii Nakryiko wrote:
> > /proc/<pid>/maps file is extremely useful in practice for various tasks
> > involving figuring out process memory layout, what files are backing any
> > given memory range, etc. One important class of applications that
> > absolutely rely on this are profilers/stack symbolizers (perf tool being one
> > of them). Patterns of use differ, but they generally would fall into two
> > categories.
> >
> > In on-demand pattern, a profiler/symbolizer would normally capture stack
> > trace containing absolute memory addresses of some functions, and would
> > then use /proc/<pid>/maps file to find corresponding backing ELF files
> > (normally, only executable VMAs are of interest), file offsets within
> > them, and then continue from there to get yet more information (ELF
> > symbols, DWARF information) to get human-readable symbolic information.
> > This pattern is used by Meta's fleet-wide profiler, as one example.
> >
> > In preprocessing pattern, application doesn't know the set of addresses
> > of interest, so it has to fetch all relevant VMAs (again, probably only
> > executable ones), store or cache them, then proceed with profiling and
> > stack trace capture. Once done, it would do symbolization based on
> > stored VMA information. This can happen at much later point in time.
> > This patterns is used by perf tool, as an example.
> >
> > In either case, there are both performance and correctness requirement
> > involved. This address to VMA information translation has to be done as
> > efficiently as possible, but also not miss any VMA (especially in the
> > case of loading/unloading shared libraries). In practice, correctness
> > can't be guaranteed (due to process dying before VMA data can be
> > captured, or shared library being unloaded, etc), but any effort to
> > maximize the chance of finding the VMA is appreciated.
> >
> > Unfortunately, for all the /proc/<pid>/maps file universality and
> > usefulness, it doesn't fit the above use cases 100%.
> >
> > First, it's main purpose is to emit all VMAs sequentially, but in
> > practice captured addresses would fall only into a smaller subset of all
> > process' VMAs, mainly containing executable text. Yet, library would
> > need to parse most or all of the contents to find needed VMAs, as there
> > is no way to skip VMAs that are of no use. Efficient library can do the
> > linear pass and it is still relatively efficient, but it's definitely an
> > overhead that can be avoided, if there was a way to do more targeted
> > querying of the relevant VMA information.
> >
> > Second, it's a text based interface, which makes its programmatic use from
> > applications and libraries more cumbersome and inefficient due to the
> > need to handle text parsing to get necessary pieces of information. The
> > overhead is actually payed both by kernel, formatting originally binary
> > VMA data into text, and then by user space application, parsing it back
> > into binary data for further use.
>
> I was trying to solve all these issues in a more generic way:
> https://lwn.net/Articles/683371/
>

Can you please provide a tl;dr summary of that effort?

> We definitely interested in this new interface to use it in CRIU.
>
> <snip>
>
> > +
> > +     if (karg.vma_name_size) {
> > +             size_t name_buf_sz = min_t(size_t, PATH_MAX, karg.vma_name_size);
> > +             const struct path *path;
> > +             const char *name_fmt;
> > +             size_t name_sz = 0;
> > +
> > +             get_vma_name(vma, &path, &name, &name_fmt);
> > +
> > +             if (path || name_fmt || name) {
> > +                     name_buf = kmalloc(name_buf_sz, GFP_KERNEL);
> > +                     if (!name_buf) {
> > +                             err = -ENOMEM;
> > +                             goto out;
> > +                     }
> > +             }
> > +             if (path) {
> > +                     name = d_path(path, name_buf, name_buf_sz);
> > +                     if (IS_ERR(name)) {
> > +                             err = PTR_ERR(name);
> > +                             goto out;
>
> It always fails if a file path name is longer than PATH_MAX.
>
> Can we add a flag to indicate whether file names are needed to be

It's already supported. Getting a VMA name is optional. See a big
comment next to the vma_name_size field in the UAPI header. If
vma_name_size is set to zero, VMA name is not retrieved at all,
avoiding the overhead and this issue with PATH_MAX.

> resolved? In criu, we use special names like "vvar", "vdso", but we dump
> files via /proc/pid/map_files.
>
> > +                     }
> > +                     name_sz = name_buf + name_buf_sz - name;
> > +             } else if (name || name_fmt) {
> > +                     name_sz = 1 + snprintf(name_buf, name_buf_sz, name_fmt ?: "%s", name);
> > +                     name = name_buf;
> > +             }
> > +             if (name_sz > name_buf_sz) {
> > +                     err = -ENAMETOOLONG;
> > +                     goto out;
> > +             }
> > +             karg.vma_name_size = name_sz;
> > +     }
>
> Thanks,
> Andrei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ