netdev - Re: [PATCH v2 bpf-next 2/2] btf: expose API to work with raw btf data

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 7 Feb 2019 12:13:38 -0800
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Andrii Nakryiko <andriin@...com>, Song Liu <songliubraving@...com>,
        Yonghong Song <yhs@...com>, Alexei Starovoitov <ast@...com>,
        Martin Lau <kafai@...com>, netdev@...r.kernel.org,
        Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH v2 bpf-next 2/2] btf: expose API to work with raw btf data

On Thu, Feb 7, 2019 at 11:21 AM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> On Tue, Feb 5, 2019 at 10:25 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Tue, Feb 05, 2019 at 09:46:14PM -0800, Andrii Nakryiko wrote:
> > > On Tue, Feb 5, 2019 at 7:07 PM Alexei Starovoitov
> > > <alexei.starovoitov@...il.com> wrote:
> > > >
> > > > On Tue, Feb 05, 2019 at 04:29:49PM -0800, Andrii Nakryiko wrote:
> > > > > This patch exposes two new APIs btf__get_raw_data_size() and
> > > > > btf__get_raw_data() that allows to get a copy of raw BTF data out of
> > > > > struct btf. This is useful for external programs that need to manipulate
> > > > > raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info
> > > > > and then writing it back to file.
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andriin@...com>
> > > > > Acked-by: Song Liu <songliubraving@...com>
> > > > > ---
> > > > >  tools/lib/bpf/btf.c      | 10 ++++++++++
> > > > >  tools/lib/bpf/btf.h      |  2 ++
> > > > >  tools/lib/bpf/libbpf.map |  2 ++
> > > > >  3 files changed, 14 insertions(+)
> > > > >
> > > > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > > > index 1c2ba7182400..34bfb3641aac 100644
> > > > > --- a/tools/lib/bpf/btf.c
> > > > > +++ b/tools/lib/bpf/btf.c
> > > > > @@ -437,6 +437,16 @@ int btf__fd(const struct btf *btf)
> > > > >       return btf->fd;
> > > > >  }
> > > > >
> > > > > +__u32 btf__get_raw_data_size(const struct btf *btf)
> > > > > +{
> > > > > +     return btf->data_size;
> > > > > +}
> > > > > +
> > > > > +void btf__get_raw_data(const struct btf *btf, char *data)
> > > > > +{
> > > > > +     memcpy(data, btf->data, btf->data_size);
> > > > > +}
> > > >
> > > > I cannot think of any other way to use this api,
> > > > but to call btf__get_raw_data_size() first,
> > > > then malloc that much memory and then call btf__get_raw_data()
> > > > to store btf into it.
> > > >
> > > > If so, may be api should be single call that allocates, copies,
> > > > and returns pointer to new mem and its size?
> > > > Probably less error prone?
> > > >
> > >
> > > I don't have strong preference, but providing pointer to allocated memory
> > > seems more flexible and allows more clever/optimal use of memory from caller
> > > side. E.g., instead of doing two mallocs, you can imagine doing something
> > > like:
> > >
> > > int max_size = max(btf__get_raw_data_size(btf),
> > >                    btf_ext__get_raw_data_size(btf_ext));
> > > char *m = malloc(max_size);
> > > btf__get_raw_data(btf, m);
> > > dump_btf_section_to_file(m, some_file);
> > > btf_ext__get_raw_data(btf_ext, m);
> > > dump_btf_ext_section_to_file(m, some_file);
> > > free(m);
> > >
> > > Also, pointer to memory could be mmap()'ed file, for instance. In general,
> > > for a library it might be a good thing to not be prescriptive as to how one
> > > gets that piece of memory.
> >
> > Plausible, but I'd like to see pahole patches to be convinced ;)
> >
>
> Here's a summary of proposed ways to expose raw data through new api,
> with pros/cons.
>
> 1. Originally proposed two functions. `int btf__get_raw_data_size()`
> to get size, `void btf__get_raw_data(void* buf)` to write raw data to
> a provided buf.
>
> Pros:
>   - allows maximal flexibility for users of this API. They can manage
> memory as it's convenient for them (e.g., reusing same buffer for
> multiple btf and btf_ext raw data).
>   - allows using mmap()'ed memory, as allocation and memory ownership
> is delegated to user
>
> Cons:
>   - has potential of buffer overflows, if user doesn't provide big enough buffer
>
>
> 2. Alexei's proposal to combine getting size in single function that
> internally allocates new memory buffer, copies data and returns it to
> users to use and later free.
>
> Pros:
>   - one less API function
>   - more straightforward usage, it's hard to misuse it (except for
> memory leaking, if memory is not freed)
>
> Cons:
>   - always allocated for each call
>   - least flexible approach, doesn't allow caller to manage memory,
> prevents any kind of direct write to mmap()'ed file
>
> 3. Daniel proposed realloc-like approach, where caller optionally
> provides memory buffer, but we always call realloc() internally to
> ensure we have long enough buffer.
>
> Pros:
>   - allows callers to provide their memory buffer (similar to approach
> #1, but see cons below)
>   - prevents user error with providing too small buffer (similar to approach #2)
>
> Cons:
>   - realloc expects that memory was allocated by previous malloc()
> call, so caller can't allocate bigger chunk of memory and provide
> pointer inside that area (behavior is undefined in that case). This
> requirement is not immediately obvious, so this approach feels more
> error prone than either of approach #1 and #2
>   - still doesn't allow mmap()'ed usage, again due to realloc()'s requirements
>

There is actually approach #4 - just return const void* to an internal
memory buffer. This is trivial for struct btf, will require just
slight changes for struct btf_ext, but it puts all the control in
user's hands without imposing any unnecessary allocations. This
approach seems to provide best of all approaches with no downsides.

>
> Approach #3 looks most subtly-error-prone, as it's too easy to just
> provide pointer that's not at the beginning of malloc()'ed memory, but
> this might not be detected immediately, and could potentially lead to
> silent memory corruption.
>
> I'd still go with approach #1 as it provides least restrictive API,
> even though approach #2 will provide marginally better usability for
> common cases.
>
> Alexei, Daniel, which approach you'd prefer in the end after
> considering all pros and cons?