netdev - Re: [PATCH bpf-next v5 2/9] bpftool: Add gen min_core

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzYz6fFF-vUWPKaaZ=47-OM9mmr0fjBdTCPdwTth5pi-Mw@mail.gmail.com>
Date:   Thu, 3 Feb 2022 09:21:41 -0800
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Mauricio Vásquez Bernal <mauricio@...volk.io>
Cc:     Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Quentin Monnet <quentin@...valent.com>,
        Rafael David Tinoco <rafaeldtinoco@...il.com>,
        Lorenzo Fontana <lorenzo.fontana@...stic.co>,
        Leonardo Di Donato <leonardo.didonato@...stic.co>
Subject: Re: [PATCH bpf-next v5 2/9] bpftool: Add gen min_core_btf command

On Thu, Feb 3, 2022 at 8:07 AM Mauricio Vásquez Bernal
<mauricio@...volk.io> wrote:
>
> On Wed, Feb 2, 2022 at 12:58 PM Andrii Nakryiko
> <andrii.nakryiko@...il.com> wrote:
> >
> > On Fri, Jan 28, 2022 at 2:33 PM Mauricio Vásquez <mauricio@...volk.io> wrote:
> > >
> > > This command is implemented under the "gen" command in bpftool and the
> > > syntax is the following:
> > >
> > > $ bpftool gen min_core_btf INPUT OUTPUT OBJECT(S)
> > >
> > > INPUT can be either a single BTF file or a folder containing BTF files,
> > > when it's a folder, a BTF file is generated for each BTF file contained
> > > in this folder. OUTPUT is the file (or folder) where generated files are
> > > stored and OBJECT(S) is the list of bpf objects we want to generate the
> > > BTF file(s) for (each generated BTF file contains all the types needed
> > > by all the objects).
> > >
> > > Signed-off-by: Mauricio Vásquez <mauricio@...volk.io>
> > > Signed-off-by: Rafael David Tinoco <rafael.tinoco@...asec.com>
> > > Signed-off-by: Lorenzo Fontana <lorenzo.fontana@...stic.co>
> > > Signed-off-by: Leonardo Di Donato <leonardo.didonato@...stic.co>
> > > ---
> > >  tools/bpf/bpftool/bash-completion/bpftool |   6 +-
> > >  tools/bpf/bpftool/gen.c                   | 112 +++++++++++++++++++++-
> > >  2 files changed, 114 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
> > > index 493753a4962e..958e1fd71b5c 100644
> > > --- a/tools/bpf/bpftool/bash-completion/bpftool
> > > +++ b/tools/bpf/bpftool/bash-completion/bpftool
> > > @@ -1003,9 +1003,13 @@ _bpftool()
> > >                              ;;
> > >                      esac
> > >                      ;;
> > > +                min_core_btf)
> > > +                    _filedir
> > > +                    return 0
> > > +                    ;;
> > >                  *)
> > >                      [[ $prev == $object ]] && \
> > > -                        COMPREPLY=( $( compgen -W 'object skeleton help' -- "$cur" ) )
> > > +                        COMPREPLY=( $( compgen -W 'object skeleton help min_core_btf' -- "$cur" ) )
> > >                      ;;
> > >              esac
> > >              ;;
> > > diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
> > > index 8f78c27d41f0..7db31b0f265f 100644
> > > --- a/tools/bpf/bpftool/gen.c
> > > +++ b/tools/bpf/bpftool/gen.c
> > > @@ -5,6 +5,7 @@
> > >  #define _GNU_SOURCE
> > >  #endif
> > >  #include <ctype.h>
> > > +#include <dirent.h>
> > >  #include <errno.h>
> > >  #include <fcntl.h>
> > >  #include <linux/err.h>
> > > @@ -1084,6 +1085,7 @@ static int do_help(int argc, char **argv)
> > >         fprintf(stderr,
> > >                 "Usage: %1$s %2$s object OUTPUT_FILE INPUT_FILE [INPUT_FILE...]\n"
> > >                 "       %1$s %2$s skeleton FILE [name OBJECT_NAME]\n"
> > > +               "       %1$s %2$s min_core_btf INPUT OUTPUT OBJECT(S)\n"
> >
> > OBJECTS(S) should be OBJECT... for this "CLI notation", no?
>
> Updated it to be "min_core_btf INPUT OUTPUT OBJECT [OBJECT...]" like
> the "bpftool object" command.
>
> >
> > >                 "       %1$s %2$s help\n"
> > >                 "\n"
> > >                 "       " HELP_SPEC_OPTIONS " |\n"
> > > @@ -1094,10 +1096,114 @@ static int do_help(int argc, char **argv)
> > >         return 0;
> > >  }
> > >
> > > +/* Create BTF file for a set of BPF objects */
> > > +static int btfgen(const char *src_btf, const char *dst_btf, const char *objspaths[])
> > > +{
> > > +       return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static int do_min_core_btf(int argc, char **argv)
> > > +{
> > > +       char src_btf_path[PATH_MAX], dst_btf_path[PATH_MAX];
> > > +       bool input_is_file, output_is_file = true;
> > > +       const char *input, *output;
> > > +       const char **objs = NULL;
> > > +       struct dirent *dir;
> > > +       struct stat st;
> > > +       DIR *d = NULL;
> > > +       int i, err;
> > > +
> > > +       if (!REQ_ARGS(3)) {
> > > +               usage();
> > > +               return -1;
> > > +       }
> > > +
> > > +       input = GET_ARG();
> > > +       if (stat(input, &st) < 0) {
> > > +               p_err("failed to stat %s: %s", input, strerror(errno));
> > > +               return -errno;
> > > +       }
> > > +
> > > +       if ((st.st_mode & S_IFMT) != S_IFDIR && (st.st_mode & S_IFMT) != S_IFREG) {
> > > +               p_err("file type not valid: %s", input);
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       input_is_file = (st.st_mode & S_IFMT) == S_IFREG;
> >
> > move before if and use input_is_file in the if itself instead of
> > duplicating all the S_IFREG flags?
> >
> > > +
> > > +       output = GET_ARG();
> > > +       if (stat(output, &st) == 0 && (st.st_mode & S_IFMT) == S_IFDIR)
> > > +               output_is_file = false;
> >
> > if stat() succeeds but it's neither directory or file, should be an
> > error, right?
> >
> > > +
> > > +       objs = (const char **) malloc((argc + 1) * sizeof(*objs));
> >
> > calloc() seems to be better suited for this (and zero-intialization is
> > nice for safety and to avoid objs[argc] = NULL after the loop below)
> >
>
> You're right!
>
> > > +       if (!objs) {
> > > +               p_err("failed to allocate array for object names");
> > > +               return -ENOMEM;
> > > +       }
> > > +
> > > +       i = 0;
> > > +       while (argc > 0)
> > > +               objs[i++] = GET_ARG();
> >
> > for (i = 0; i < argc; i++) ?
>
> GET_ARG() does argc--. I see this loop is usually written as while
> (argc) in bpftool.

Ah, I missed GET_ARG()'s side effect.

>
> >
> > > +
> > > +       objs[i] = NULL;
> > > +
> > > +       /* single BTF file */
> > > +       if (input_is_file) {
> > > +               p_info("Processing source BTF file: %s", input);
> > > +
> > > +               if (output_is_file) {
> > > +                       err = btfgen(input, output, objs);
> > > +                       goto out;
> > > +               }
> > > +               snprintf(dst_btf_path, sizeof(dst_btf_path), "%s/%s", output,
> > > +                        basename(input));
> > > +               err = btfgen(input, dst_btf_path, objs);
> > > +               goto out;
> > > +       }
> > > +
> > > +       if (output_is_file) {
> > > +               p_err("can't have just one file as output");
> > > +               err = -EINVAL;
> > > +               goto out;
> > > +       }
> > > +
> > > +       /* directory with BTF files */
> > > +       d = opendir(input);
> > > +       if (!d) {
> > > +               p_err("error opening input dir: %s", strerror(errno));
> > > +               err = -errno;
> > > +               goto out;
> > > +       }
> > > +
> > > +       while ((dir = readdir(d)) != NULL) {
> > > +               if (dir->d_type != DT_REG)
> > > +                       continue;
> > > +
> > > +               if (strncmp(dir->d_name + strlen(dir->d_name) - 4, ".btf", 4))
> > > +                       continue;
> >
> > this whole handling of input directory feels a bit icky, tbh... maybe
> > we should require explicit listing of input files always. In CLI
> > invocation those could be separated by "keywords", something like
> > this:
> >
> > bpftool gen min_core_btf <output> inputs <file1> <file2> .... objects
> > <obj1> <obj2> ...
> >
> > a bit of a downside is that you can't have a file named "inputs" or
> > "objects", but that seems extremely unlikely? Quentin, any opinion as
> > well?
> >
> > I'm mainly off put by a bit random ".btf" naming convention, the
> > DT_REG skipping, etc.
> >
> > Another cleaner alternative from POV of bpftool (but might be less
> > convenient for users) is to use @file convention to specify a file
> > that contains a list of files. So
> >
> > bpftool gen min_core_btf <output> @btf_filelist.txt @obj_filelist.txt
> >
> > would take lists of inputs and outputs from respective files?
> >
> >
> > But actually, let's take a step back again. Why should there be
> > multiple inputs and outputs?
>
> We're thinking about the use case when there are multiple source BTF
> files in a folder and we want to generate a BTF for each one of them,
> by supporting input and output folders we're able to avoid executing
> bpftool multiple times. I agree that it complicates the implementation
> and that the same can be done by using a script to run bpftool
> multiple times, hence let's remove it. If we find out later on that
> this is really important we can implement it.

Yeah, I figured you have this use case, but I think it's better to
keep the interface simple. If that becomes a big performance problem,
we can always extend it later.

>
>
> > I can see why multiple objects are
> > mandatory (you have an application that has multiple BPF objects used
> > internally). But processing single vmlinux BTF at a time seems
> > absolutely fine. I don't buy that CO-RE relo processing is that slow
> > to require optimized batch processing.
> >
> > I might have asked this before, sorry, but the duration between each
> > iteration of btfgen is pretty long and I'm losing the context.
> >
> > > +
> > > +               snprintf(src_btf_path, sizeof(src_btf_path), "%s%s", input, dir->d_name);
> > > +               snprintf(dst_btf_path, sizeof(dst_btf_path), "%s%s", output, dir->d_name);
> > > +
> > > +               p_info("Processing source BTF file: %s", src_btf_path);
> > > +
> > > +               err = btfgen(src_btf_path, dst_btf_path, objs);
> > > +               if (err)
> > > +                       goto out;
> > > +       }
> > > +
> > > +out:
> > > +       free(objs);
> > > +       if (d)
> > > +               closedir(d);
> > > +       return err;
> > > +}
> > > +
> > >  static const struct cmd cmds[] = {
> > > -       { "object",     do_object },
> > > -       { "skeleton",   do_skeleton },
> > > -       { "help",       do_help },
> > > +       { "object",             do_object },
> > > +       { "skeleton",           do_skeleton },
> > > +       { "min_core_btf",       do_min_core_btf},
> > > +       { "help",               do_help },
> > >         { 0 }
> > >  };
> > >
> > > --
> > > 2.25.1
> > >