lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhsuW4JJiMNqvzK+8SKM3=72xgsF+jxB3m-u-Jz9Fe7Z4i9fg@mail.gmail.com>
Date:   Tue, 25 Jan 2022 16:16:06 -0800
From:   Song Liu <song@...nel.org>
To:     Hao Luo <haoluo@...gle.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        Martin KaFai Lau <kafai@...com>,
        KP Singh <kpsingh@...nel.org>, bpf <bpf@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        Jiri Olsa <jolsa@...nel.org>,
        Blake Jones <blakejones@...gle.com>,
        Alexey Alexandrov <aalexand@...gle.com>,
        Namhyung Kim <namhyung@...gle.com>,
        Ian Rogers <irogers@...gle.com>,
        "pasha.tatashin@...een.com" <pasha.tatashin@...een.com>
Subject: Re: [Question] How to reliably get BuildIDs from bpf prog

On Tue, Jan 25, 2022 at 3:54 PM Hao Luo <haoluo@...gle.com> wrote:
>
> Thanks Song for your suggestion.
>
> On Mon, Jan 24, 2022 at 11:08 PM Song Liu <song@...nel.org> wrote:
> >
> > On Mon, Jan 24, 2022 at 2:43 PM Hao Luo <haoluo@...gle.com> wrote:
> > >
> > > Dear BPF experts,
> > >
> > > I'm working on collecting some kernel performance data using BPF
> > > tracing prog. Our performance profiling team wants to associate the
> > > data with user stack information. One of the requirements is to
> > > reliably get BuildIDs from bpf_get_stackid() and other similar helpers
> > > [1].
> > >
> > > As part of an early investigation, we found that there are a couple
> > > issues that make bpf_get_stackid() much less reliable than we'd like
> > > for our use:
> > >
> > > 1. The first page of many binaries (which contains the ELF headers and
> > > thus the BuildID that we need) is often not in memory. The failure of
> > > find_get_page() (called from build_id_parse()) is higher than we would
> > > want.
> >
> > Our top use case of bpf_get_stack() is called from NMI, so there isn't
> > much we can do. Maybe it is possible to improve it by changing the
> > layout of the binary and the libraries? Specifically, if the text is
> > also in the first page, it is likely to stay in memory?
> >
>
> We are seeing 30-40% of stack frames not able to get build ids due to
> this. This is a place where we could improve the reliability of build
> id.
>
> There were a few proposals coming up when we found this issue. One of
> them is to have userspace mlock the first page. This would be the
> easiest fix, if it works. Another proposal from Ian Rogers (cc'ed) is
> to embed build id in vma. This is an idea similar to [1], but it's
> unclear (at least to me) where to store the string. I'm wondering if
> we can introduce a sleepable version of bpf_get_stack() if it helps.
> When a page is not present, sleepable bpf_get_stack() can bring in the
> page.

I guess it is possible to have different flavors of bpf_get_stack().
However, I am not sure whether the actual use case could use sleepable
BPF programs. Our user of bpf_get_stack() is a profiler. The BPF program
which triggers a perf_event from NMI, where we really cannot sleep.

If we have target use case that could sleep, sleepable bpf_get_stack() sounds
reasonable to me.

>
> [1] https://lwn.net/Articles/867818/
>
> > > 2. When anonymous huge pages are used to hold some regions of process
> > > text, build_id_parse() also fails to get a BuildID because
> > > vma->vm_file is NULL.
> >
> > How did the text get in anonymous memory? I guess it is NOT from JIT?
> > We had a hack to use transparent huge page for application text. The
> > hack looks like:
> >
> > "At run time, the application creates an 8MB temporary buffer and the
> > hot section of the executable memory is copied to it. The 8MB region in
> > the executable memory is then converted to a huge page (by way of an
> > mmap() to anonymous pages and an madvise() to create a huge page), the
> > data is copied back to it, and it is made executable again using
> > mprotect()."
> >
> > If your case is the same (or similar), it can probably be fixed with
> > CONFIG_READ_ONLY_THP_FOR_FS, and modified user space.
> >
>
> In our use cases, we have text mapped to huge pages that are not
> backed by files. vma->vm_file could be null or points some fake file.
> This causes challenges for us on getting build id for these code text.

So, what is the ideal output in these cases? If there isn't a back file, we
don't really have good build-id for it, right?

Thanks,
Song

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ