netdev - Re: explicit maps. Was: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bzbc0VAMjxt=K6nguLz0aP+YEt9Au+KWh-WxvZR19KCD4A@mail.gmail.com>
Date:   Mon, 10 Jun 2019 16:48:57 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Alexei Starovoitov <ast@...com>
Cc:     Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Stanislav Fomichev <sdf@...ichev.me>,
        Andrii Nakryiko <andriin@...com>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>, Yonghong Song <yhs@...com>
Subject: Re: explicit maps. Was: [RFC PATCH bpf-next 6/8] libbpf: allow
 specifying map definitions using BTF

On Sun, Jun 9, 2019 at 6:17 PM Alexei Starovoitov <ast@...com> wrote:
>
> On 6/6/19 6:02 PM, Jakub Kicinski wrote:
> > On Fri, 7 Jun 2019 00:27:52 +0000, Alexei Starovoitov wrote:
> >> the solution we're discussing should solve BPF_ANNOTATE_KV_PAIR too.
> >> That hack must go.
> >
> > I see.
> >
> >> If I understood your objections to Andrii's format is that
> >> you don't like pointer part of key/value while Andrii explained
> >> why we picked the pointer, right?
> >>
> >> So how about:
> >>
> >> struct {
> >>     int type;
> >>     int max_entries;
> >>     struct {
> >>       __u32 key;
> >>       struct my_value value;
> >>     } types[];
> >> } ...
> >
> > My objection is that k/v fields are never initialized, so they're
> > "metafields", mixed with real fields which hold parameters - like
> > type, max_entries etc.
>
> I don't share this meta fields vs real fields distinction.

100% agree.

> All of the fields are meta.
> Kernel implementation of the map doesn't need to hold type and
> max_entries as actual configuration fields.
> The map definition in c++ would have looked like:
> bpf::hash_map<int, struct my_value, 1000, NO_PREALLOC> foo;
> bpf::array_map<struct my_value, 2000> bar;
>
> Sometime key is not necessary. Sometimes flags have to be zero.
> bpf syscall api is a superset of all fiels for all maps.
> All of them are configuration and meta fields at the same time.
> In c++ example there is really no difference between
> 'struct my_value' and '1000' attributes.
>
> I'm pretty sure bpf will have C++ front-end in the future,
> but until then we have to deal with C and, I think, the map
> definition should be the most natural C syntax.
> In that sense what you're proposing with extern:
> > extern struct my_key my_key;
> > extern int type_int;
> >
> > struct map_def {
> >      int type;
> >      int max_entries;
> >      void *btf_key_ref;
> >      void *btf_val_ref;
> > } = {
> >      ...
> >      .btf_key_ref = &my_key,
> >      .btf_val_ref = &type_int,
> > };
>
> is worse than
>
> struct map_def {
>        int type;
>        int max_entries;
>        int btf_key;
>        struct my_key btf_value;
> };
>
> imo explicit key and value would be ideal,

also agree 100%, that's how I started, but then was quickly pointed to
a real cases where value is just way too big.

> but they take too much space. Hence pointers
> or zero sized array:
> struct {
>       int type;
>       int max_entries;
>       struct {
>         __u32 key;
>         struct my_value value;
>       } types[];
> };

This works, but I still prefer simpler

__u32 *key;
struct my_value *value;

It has less visual clutter and doesn't rely on somewhat obscure
flexible array feature (and it will have to be last in the struct,
unless you do zero-sized array w/ [0]).

>
> I think we should also consider explicit map creation.
>
> Something like:
>
> struct my_map {
>    __u32 key;
>    struct my_value value;
> } *my_hash_map, *my_pinned_hash_map;
>
> struct {
>     __u64 key;
>    struct my_map *value;
> } *my_hash_of_maps;
>
> struct {
>    struct my_map *value;
> } *my_array_of_maps;
>
> __init void create_my_maps(void)
> {
>    bpf_create_hash_map(&my_hash_map, 1000/*max_entries*/);
>    bpf_obj_get(&my_pinned_hash_map, "/sys/fs/bpf/my_map");
>    bpf_create_hash_of_maps(&my_hash_of_maps, 1000/*max_entries*/);
>    bpf_create_array_of_maps(&my_array_of_maps, 20);
> }
>
> SEC("cgroup/skb")
> int bpf_prog(struct __sk_buff *skb)
> {
>    struct my_value *val;
>    __u32 key;
>    __u64 key64;
>    struct my_map *map;
>
>    val = bpf_map_lookup(my_hash_map, &key);
>    map = bpf_map_lookup(my_hash_of_maps, &key64);
> }
>
> '__init' section will be compiled by llvm into bpf instructions
> that will be executed in users space by libbpf.
> The __init prog has to succeed otherwise prog load fails.
>
> May be all map pointers should be in a special section to avoid
> putting them into datasec, but libbpf should be able to figure that
> out without requiring user to specify the .map section.
> The rest of global vars would go into special datasec map.
>
> No llvm changes necessary and BTF is available for keys and values.
>
> libbpf can start with simple __init and eventually grow into
> complex init procedure where maps are initialized,
> prog_array is populated, etc.
>
> Thoughts?

I have few. :)

I think it would be great to have this feature as a sort of "escape
hatch" for really complicated initialization of maps, which can't be
done w/ declarative syntax (and doing it from user-land driving app is
not possible/desirable). But there is a lot of added complexity and
work to be done to make this happen:

1. We'll need to build BPF interpreter into libbpf (so partial
duplication of in-kernel BPF machinery);
2. We'll need to define some sort of user-space BPF API, so that these
init functions can call into libbpf API (at least). So now in addition
to in-kernel BPF helpers, we'll have another and different set of
helpers/APIs exposed to user-land BPF code. This will certainly add
confusion and raise learning curve.
3. Next we'll be adding not-just-libbpf APIs, for cases where the size
of map depends on some system parameter (e.g., number of CPUs, or
amount of free RAM, or something else). This probably can be done
through exposed libbpf APIs again, but now we'll need to decide what
gets exposed, in what format, etc.

It's all doable, but looks like a very large effort, while we yet
don't have a realistic use case for this. Today cases like that are
handled by driving user-land app. It seems like having prog_array and
map-in-map declarative initialization covers a lot of advanced use
cases (plus, of course, pinning), so for starters I'd concentrate
effort there to get declarative approach powerful enough to address a
lot of real-world needs.

The good thing, though, is that nothing prevents us from specifying
and adding this later, once we have good use cases and most needs
already covered w/ declarative syntax.

But, assuming we do explicit map creation, I'd also vote for per-map
"factory" functions, like this:

typedef int (*map_factory_fn)(struct bpf_map); /* can be provided by libbpf */

int init_my_map(struct bpf_map *map)
{
    /* something fancy here */
}

struct {
    __u64 *key;
    struct my_value *value;
    map_factory_fn factory;
} my_map SEC(".maps") = {
    .factory = &init_my_map,
};

/* we can still have per-BPF object init function: */
int init_my_app(struct bpf_object *obj) {
    /* some more initialization of BPF object */
}