[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzY5wopSFj2B2_Q9VtNGoGtzyZ7MOUv1oDugCXma1kk3UA@mail.gmail.com>
Date: Mon, 23 Dec 2019 11:54:17 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Martin KaFai Lau <kafai@...com>
Cc: bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
David Miller <davem@...emloft.net>,
Kernel Team <kernel-team@...com>,
Networking <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2 10/11] bpf: libbpf: Add STRUCT_OPS support
On Fri, Dec 20, 2019 at 10:26 PM Martin KaFai Lau <kafai@...com> wrote:
>
> This patch adds BPF STRUCT_OPS support to libbpf.
>
> The only sec_name convention is SEC(".struct_ops") to identify the
> struct_ops implemented in BPF,
> e.g. To implement a tcp_congestion_ops:
>
> SEC(".struct_ops")
> struct tcp_congestion_ops dctcp = {
> .init = (void *)dctcp_init, /* <-- a bpf_prog */
> /* ... some more func prts ... */
> .name = "bpf_dctcp",
> };
>
> Each struct_ops is defined as a global variable under SEC(".struct_ops")
> as above. libbpf creates a map for each variable and the variable name
> is the map's name. Multiple struct_ops is supported under
> SEC(".struct_ops").
>
> In the bpf_object__open phase, libbpf will look for the SEC(".struct_ops")
> section and find out what is the btf-type the struct_ops is
> implementing. Note that the btf-type here is referring to
> a type in the bpf_prog.o's btf. A "struct bpf_map" is added
> by bpf_object__add_map() as other maps do. It will then
> collect (through SHT_REL) where are the bpf progs that the
> func ptrs are referring to. No btf_vmlinux is needed in
> the open phase.
>
> In the bpf_object__load phase, the map-fields, which depend
> on the btf_vmlinux, are initialized (in bpf_map__init_kern_struct_ops()).
> It will also set the prog->type, prog->attach_btf_id, and
> prog->expected_attach_type. Thus, the prog's properties do
> not rely on its section name.
> [ Currently, the bpf_prog's btf-type ==> btf_vmlinux's btf-type matching
> process is as simple as: member-name match + btf-kind match + size match.
> If these matching conditions fail, libbpf will reject.
> The current targeting support is "struct tcp_congestion_ops" which
> most of its members are function pointers.
> The member ordering of the bpf_prog's btf-type can be different from
> the btf_vmlinux's btf-type. ]
>
> Then, all obj->maps are created as usual (in bpf_object__create_maps()).
>
> Once the maps are created and prog's properties are all set,
> the libbpf will proceed to load all the progs.
>
> bpf_map__attach_struct_ops() is added to register a struct_ops
> map to a kernel subsystem.
>
> Signed-off-by: Martin KaFai Lau <kafai@...com>
> ---
This looks great, Martin! I just have few nits/suggestions, but
overall I think this approach is much better.
After this lands, please follow up with adding support for struct_ops
maps into BPF skeleton, so that it's attached automatically on
skeleton load.
> tools/lib/bpf/bpf.c | 10 +-
> tools/lib/bpf/bpf.h | 5 +-
> tools/lib/bpf/libbpf.c | 639 +++++++++++++++++++++++++++++++++-
> tools/lib/bpf/libbpf.h | 1 +
> tools/lib/bpf/libbpf.map | 1 +
> tools/lib/bpf/libbpf_probes.c | 2 +
> 6 files changed, 646 insertions(+), 12 deletions(-)
>
[...]
> + member = btf_members(type);
> + for (i = 0; i < btf_vlen(type); i++, member++) {
> + const struct btf_type *mtype, *kern_mtype;
> + __u32 mtype_id, kern_mtype_id;
> + void *mdata, *kern_mdata;
> + __s64 msize, kern_msize;
> + __u32 moff, kern_moff;
> + __u32 kern_member_idx;
> + const char *mname;
> +
> + mname = btf__name_by_offset(btf, member->name_off);
> + kern_member = find_member_by_name(kern_btf, kern_type, mname);
> + if (!kern_member) {
> + pr_warn("struct_ops map %s init_kern %s: Cannot find member %s in kernel BTF\n",
> + map->name, tname, mname);
> + return -ENOTSUP;
> + }
> +
> + kern_member_idx = kern_member - btf_members(kern_type);
> + if (btf_member_bitfield_size(type, i) ||
> + btf_member_bitfield_size(kern_type, kern_member_idx)) {
> + pr_warn("struct_ops map %s init_kern %s: bitfield %s is not supported\n",
> + map->name, tname, mname);
> + return -ENOTSUP;
> + }
> +
> + moff = member->offset / 8;
> + kern_moff = kern_member->offset / 8;
> +
> + mdata = data + moff;
> + kern_mdata = kern_data + kern_moff;
> +
> + mtype_id = member->type;
> + kern_mtype_id = kern_member->type;
> +
> + mtype = resolve_ptr(btf, mtype_id, NULL);
> + kern_mtype = resolve_ptr(kern_btf, kern_mtype_id, NULL);
> + if (mtype && kern_mtype) {
This check seems more logical after you resolve mtype_id and
kern_mtype_id below and check that they have same BTF_INFO_KIND. After
that you can check for a case of pointers and handle it, if not
pointers - proceed to determining size.
That way you might also get rid of resolve_ptr and resolve_func_ptr
functions, because here you already resolved pointer, so just normal
btf__resolve_type will give you what it's pointing to. Then the only
use of resolve_func_ptr_resolve_ptr will be in
bpf_object__collect_struct_ops_map_reloc, which you can just inline at
that point and not really loose any readability.
> + struct bpf_program *prog;
> +
> + if (!btf_is_func_proto(mtype) ||
> + !btf_is_func_proto(kern_mtype)) {
> + pr_warn("struct_ops map %s init_kern %s: non func ptr %s is not supported\n",
> + map->name, tname, mname);
> + return -ENOTSUP;
> + }
> +
> + prog = st_ops->progs[i];
> + if (!prog) {
> + pr_debug("struct_ops map %s init_kern %s: func ptr %s is not set\n",
> + map->name, tname, mname);
> + continue;
> + }
> +
> + if (prog->type != BPF_PROG_TYPE_UNSPEC &&
> + (prog->type != BPF_PROG_TYPE_STRUCT_OPS ||
> + prog->attach_btf_id != kern_type_id ||
> + prog->expected_attach_type != kern_member_idx)) {
> + pr_warn("struct_ops map %s init_kern %s: Cannot use prog %s in type %u attach_btf_id %u expected_attach_type %u for func ptr %s\n",
> + map->name, tname, prog->name,
> + prog->type, prog->attach_btf_id,
> + prog->expected_attach_type, mname);
> + return -ENOTSUP;
> + }
> +
> + prog->type = BPF_PROG_TYPE_STRUCT_OPS;
> + prog->attach_btf_id = kern_type_id;
> + prog->expected_attach_type = kern_member_idx;
> +
> + st_ops->kern_func_off[i] = kern_data_off + kern_moff;
> +
> + pr_debug("struct_ops map %s init_kern %s: func ptr %s is set to prog %s from data(+%u) to kern_data(+%u)\n",
> + map->name, tname, mname, prog->name, moff,
> + kern_moff);
> +
> + continue;
> + }
> +
> + mtype_id = btf__resolve_type(btf, mtype_id);
> + kern_mtype_id = btf__resolve_type(kern_btf, kern_mtype_id);
> + if (mtype_id < 0 || kern_mtype_id < 0) {
> + pr_warn("struct_ops map %s init_kern %s: Cannot resolve the type for %s\n",
> + map->name, tname, mname);
> + return -ENOTSUP;
> + }
> +
> + mtype = btf__type_by_id(btf, mtype_id);
> + kern_mtype = btf__type_by_id(kern_btf, kern_mtype_id);
> + if (BTF_INFO_KIND(mtype->info) !=
> + BTF_INFO_KIND(kern_mtype->info)) {
> + pr_warn("struct_ops map %s init_kern %s: Unmatched member type %s %u != %u(kernel)\n",
> + map->name, tname, mname,
> + BTF_INFO_KIND(mtype->info),
> + BTF_INFO_KIND(kern_mtype->info));
> + return -ENOTSUP;
> + }
> +
> + msize = btf__resolve_size(btf, mtype_id);
> + kern_msize = btf__resolve_size(kern_btf, kern_mtype_id);
> + if (msize < 0 || kern_msize < 0 || msize != kern_msize) {
> + pr_warn("struct_ops map %s init_kern %s: Error in size of member %s: %zd != %zd(kernel)\n",
> + map->name, tname, mname,
> + (ssize_t)msize, (ssize_t)kern_msize);
> + return -ENOTSUP;
> + }
> +
> + pr_debug("struct_ops map %s init_kern %s: copy %s %u bytes from data(+%u) to kern_data(+%u)\n",
> + map->name, tname, mname, (unsigned int)msize,
> + moff, kern_moff);
> + memcpy(kern_mdata, mdata, msize);
> + }
> +
> + return 0;
> +}
> +
[...]
> + symbols = obj->efile.symbols;
> + btf = obj->btf;
> + nrels = shdr->sh_size / shdr->sh_entsize;
> + for (i = 0; i < nrels; i++) {
> + if (!gelf_getrel(data, i, &rel)) {
> + pr_warn("struct_ops map reloc: failed to get %d reloc\n", i);
> + return -LIBBPF_ERRNO__FORMAT;
> + }
> +
> + if (!gelf_getsym(symbols, GELF_R_SYM(rel.r_info), &sym)) {
> + pr_warn("struct_ops map reloc: symbol %" PRIx64 " not found\n",
please use %zx and explicit cast to size_t instead of PRIx64
> + GELF_R_SYM(rel.r_info));
> + return -LIBBPF_ERRNO__FORMAT;
> + }
> +
> + name = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
> + sym.st_name) ? : "<?>";
> + map = find_struct_ops_map_by_offset(obj, rel.r_offset);
> + if (!map) {
> + pr_warn("struct_ops map reloc: cannot find map at rel.r_offset %zu\n",
> + (size_t)rel.r_offset);
> + return -EINVAL;
> + }
> +
> + moff = rel.r_offset - map->sec_offset;
nit: double space
> + shdr_idx = sym.st_shndx;
> + st_ops = map->st_ops;
> + tname = st_ops->tname;
[...]
> + datasec = btf__type_by_id(btf, datasec_id);
> + vsi = btf_var_secinfos(datasec);
> + for (i = 0; i < btf_vlen(datasec); i++, vsi++) {
> + type = btf__type_by_id(obj->btf, vsi->type);
> + var_name = btf__name_by_offset(obj->btf, type->name_off);
> +
> + type_id = btf__resolve_type(obj->btf, vsi->type);
> + if (type_id < 0) {
> + pr_warn("struct_ops init: Cannot resolve var type_id %u in DATASEC %s\n",
> + vsi->type, STRUCT_OPS_SEC);
> + return -EINVAL;
> + }
> +
> + type = btf__type_by_id(obj->btf, type_id);
> + tname = btf__name_by_offset(obj->btf, type->name_off);
> + if (!btf_is_struct(type)) {
if tname is empty, it's also not of much use, so might be a good idea
to error out here with some context?
> + pr_warn("struct_ops init: %s is not a struct\n", tname);
> + return -EINVAL;
> + }
> +
> + map = bpf_object__add_map(obj);
> + if (IS_ERR(map))
> + return PTR_ERR(map);
> +
> + map->sec_idx = obj->efile.st_ops_shndx;
> + map->sec_offset = vsi->offset;
> + map->name = strdup(var_name);
> + if (!map->name)
> + return -ENOMEM;
> +
> + map->def.type = BPF_MAP_TYPE_STRUCT_OPS;
> + map->def.key_size = sizeof(int);
> + map->def.value_size = type->size;
> + map->def.max_entries = 1;
> +
> + map->st_ops = calloc(1, sizeof(*map->st_ops));
> + if (!map->st_ops)
> + return -ENOMEM;
> + st_ops = map->st_ops;
> + st_ops->data = malloc(type->size);
> + st_ops->progs = calloc(btf_vlen(type), sizeof(*st_ops->progs));
> + st_ops->kern_func_off = malloc(btf_vlen(type) *
> + sizeof(*st_ops->kern_func_off));
> + if (!st_ops->data || !st_ops->progs || !st_ops->kern_func_off)
> + return -ENOMEM;
> +
> + memcpy(st_ops->data,
> + obj->efile.st_ops_data->d_buf + vsi->offset,
> + type->size);
maybe also check that d_size is big enough to read data from?
> + st_ops->tname = tname;
> + st_ops->type = type;
> + st_ops->type_id = type_id;
> +
> + pr_debug("struct_ops init: %s found. type_id:%u var_name:%s offset:%u\n",
> + tname, type_id, var_name, vsi->offset);
> + }
> +
> + return 0;
> +}
> +
[...]
>
> - for (i = 0; i < obj->nr_maps; i++)
> + for (i = 0; i < obj->nr_maps; i++) {
> zclose(obj->maps[i].fd);
> + if (obj->maps[i].st_ops)
> + zfree(&obj->maps[i].st_ops->kern_vdata);
any specific reason to deallocate only kern_vdata? maybe just
consolidate all the clean up in bpf_object__close instead?
> + }
>
> for (i = 0; i < obj->nr_programs; i++)
> bpf_program__unload(&obj->programs[i]);
> @@ -4866,6 +5427,7 @@ int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
> err = err ? : bpf_object__resolve_externs(obj, obj->kconfig);
> err = err ? : bpf_object__sanitize_and_load_btf(obj);
> err = err ? : bpf_object__sanitize_maps(obj);
> + err = err ? : bpf_object__init_kern_struct_ops_maps(obj);
> err = err ? : bpf_object__create_maps(obj);
> err = err ? : bpf_object__relocate(obj, attr->target_btf_path);
> err = err ? : bpf_object__load_progs(obj, attr->log_level);
> @@ -5453,6 +6015,13 @@ void bpf_object__close(struct bpf_object *obj)
> map->mmaped = NULL;
> }
>
> + if (map->st_ops) {
> + zfree(&map->st_ops->data);
> + zfree(&map->st_ops->progs);
> + zfree(&map->st_ops->kern_func_off);
> + zfree(&map->st_ops);
> + }
> +
> zfree(&map->name);
> zfree(&map->pin_path);
> }
> @@ -5954,7 +6523,7 @@ int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
> int libbpf_find_vmlinux_btf_id(const char *name,
> enum bpf_attach_type attach_type)
> {
> - struct btf *btf = bpf_core_find_kernel_btf();
> + struct btf *btf = bpf_find_kernel_btf();
> char raw_tp_btf[128] = BTF_PREFIX;
> char *dst = raw_tp_btf + sizeof(BTF_PREFIX) - 1;
> const char *btf_name;
> @@ -6780,6 +7349,58 @@ struct bpf_link *bpf_program__attach(struct bpf_program *prog)
> return sec_def->attach_fn(sec_def, prog);
> }
>
> +static int bpf_link__detach_struct_ops(struct bpf_link *link)
> +{
> + struct bpf_link_fd *l = (void *)link;
> + __u32 zero = 0;
> +
> + if (bpf_map_delete_elem(l->fd, &zero))
> + return -errno;
> +
> + return 0;
> +}
> +
> +struct bpf_link *bpf_map__attach_struct_ops(struct bpf_map *map)
> +{
This looks great!
[...]
> enum bpf_perf_event_ret
> bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
> void **copy_mem, size_t *copy_size,
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index fe592ef48f1b..9c54f252f90f 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -355,6 +355,7 @@ struct bpf_map_def {
> * so no need to worry about a name clash.
> */
> struct bpf_map;
> +LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(struct bpf_map *map);
nit: can you please move it into the section with all the bpf_link and
attach APIs?
> LIBBPF_API struct bpf_map *
> bpf_object__find_map_by_name(const struct bpf_object *obj, const char *name);
>
[...]
Powered by blists - more mailing lists