[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190610141528.38c71524@cakuba.netronome.com>
Date: Mon, 10 Jun 2019 14:15:28 -0700
From: Jakub Kicinski <jakub.kicinski@...ronome.com>
To: Alexei Starovoitov <ast@...com>
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Stanislav Fomichev <sdf@...ichev.me>,
Andrii Nakryiko <andriin@...com>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Kernel Team <Kernel-team@...com>, Yonghong Song <yhs@...com>
Subject: Re: explicit maps. Was: [RFC PATCH bpf-next 6/8] libbpf: allow
specifying map definitions using BTF
On Mon, 10 Jun 2019 01:17:13 +0000, Alexei Starovoitov wrote:
> On 6/6/19 6:02 PM, Jakub Kicinski wrote:
> > On Fri, 7 Jun 2019 00:27:52 +0000, Alexei Starovoitov wrote:
> >> the solution we're discussing should solve BPF_ANNOTATE_KV_PAIR too.
> >> That hack must go.
> >
> > I see.
> >
> >> If I understood your objections to Andrii's format is that
> >> you don't like pointer part of key/value while Andrii explained
> >> why we picked the pointer, right?
> >>
> >> So how about:
> >>
> >> struct {
> >> int type;
> >> int max_entries;
> >> struct {
> >> __u32 key;
> >> struct my_value value;
> >> } types[];
> >> } ...
> >
> > My objection is that k/v fields are never initialized, so they're
> > "metafields", mixed with real fields which hold parameters - like
> > type, max_entries etc.
>
> I don't share this meta fields vs real fields distinction.
> All of the fields are meta.
> Kernel implementation of the map doesn't need to hold type and
> max_entries as actual configuration fields.
> The map definition in c++ would have looked like:
> bpf::hash_map<int, struct my_value, 1000, NO_PREALLOC> foo;
> bpf::array_map<struct my_value, 2000> bar;
>
> Sometime key is not necessary. Sometimes flags have to be zero.
> bpf syscall api is a superset of all fiels for all maps.
> All of them are configuration and meta fields at the same time.
> In c++ example there is really no difference between
> 'struct my_value' and '1000' attributes.
>
> I'm pretty sure bpf will have C++ front-end in the future,
> but until then we have to deal with C and, I think, the map
> definition should be the most natural C syntax.
> In that sense what you're proposing with extern:
> > extern struct my_key my_key;
> > extern int type_int;
> >
> > struct map_def {
> > int type;
> > int max_entries;
> > void *btf_key_ref;
> > void *btf_val_ref;
> > } = {
> > ...
> > .btf_key_ref = &my_key,
> > .btf_val_ref = &type_int,
> > };
>
> is worse than
>
> struct map_def {
> int type;
> int max_entries;
> int btf_key;
> struct my_key btf_value;
> };
>
> imo explicit key and value would be ideal,
> but they take too much space. Hence pointers
> or zero sized array:
> struct {
> int type;
> int max_entries;
> struct {
> __u32 key;
> struct my_value value;
> } types[];
> };
It is a C syntax problem, I do agree with you that it works well for
templates. The map_def structure holds parameters, and we can't take
a type as a value in C. Hence the types[] in your proposal - you could
as well call them ghost_fields[] :)
> I think we should also consider explicit map creation.
>
> Something like:
>
> struct my_map {
> __u32 key;
> struct my_value value;
> } *my_hash_map, *my_pinned_hash_map;
>
> struct {
> __u64 key;
> struct my_map *value;
> } *my_hash_of_maps;
>
> struct {
> struct my_map *value;
> } *my_array_of_maps;
>
> __init void create_my_maps(void)
> {
> bpf_create_hash_map(&my_hash_map, 1000/*max_entries*/);
> bpf_obj_get(&my_pinned_hash_map, "/sys/fs/bpf/my_map");
> bpf_create_hash_of_maps(&my_hash_of_maps, 1000/*max_entries*/);
> bpf_create_array_of_maps(&my_array_of_maps, 20);
> }
>
> SEC("cgroup/skb")
> int bpf_prog(struct __sk_buff *skb)
> {
> struct my_value *val;
> __u32 key;
> __u64 key64;
> struct my_map *map;
>
> val = bpf_map_lookup(my_hash_map, &key);
> map = bpf_map_lookup(my_hash_of_maps, &key64);
> }
>
> '__init' section will be compiled by llvm into bpf instructions
> that will be executed in users space by libbpf.
> The __init prog has to succeed otherwise prog load fails.
>
> May be all map pointers should be in a special section to avoid
> putting them into datasec, but libbpf should be able to figure that
> out without requiring user to specify the .map section.
> The rest of global vars would go into special datasec map.
>
> No llvm changes necessary and BTF is available for keys and values.
>
> libbpf can start with simple __init and eventually grow into
> complex init procedure where maps are initialized,
> prog_array is populated, etc.
>
> Thoughts?
I like it! :)
Powered by blists - more mailing lists