[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzZkkA2DouqZXH=oG9NJ7Gq7YXK5+OL=RAYbVthxgt-zcQ@mail.gmail.com>
Date: Thu, 18 Apr 2019 18:18:53 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: bpf@...r.kernel.org, Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Joe Stringer <joe@...d.net.nz>, Yonghong Song <yhs@...com>,
Martin Lau <kafai@...com>, jannh@...gle.com,
Andrey Ignatov <rdna@...com>
Subject: Re: [PATCH bpf-next v6 11/16] bpf, libbpf: support global
data/bss/rodata sections
On Tue, Apr 9, 2019 at 2:20 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> This work adds BPF loader support for global data sections
> to libbpf. This allows to write BPF programs in more natural
> C-like way by being able to define global variables and const
> data.
>
> Back at LPC 2018 [0] we presented a first prototype which
> implemented support for global data sections by extending BPF
> syscall where union bpf_attr would get additional memory/size
> pair for each section passed during prog load in order to later
> add this base address into the ldimm64 instruction along with
> the user provided offset when accessing a variable. Consensus
> from LPC was that for proper upstream support, it would be
> more desirable to use maps instead of bpf_attr extension as
> this would allow for introspection of these sections as well
> as potential live updates of their content. This work follows
> this path by taking the following steps from loader side:
>
> 1) In bpf_object__elf_collect() step we pick up ".data",
> ".rodata", and ".bss" section information.
>
> 2) If present, in bpf_object__init_internal_map() we add
> maps to the obj's map array that corresponds to each
> of the present sections. Given section size and access
> properties can differ, a single entry array map is
> created with value size that is corresponding to the
> ELF section size of .data, .bss or .rodata. These
> internal maps are integrated into the normal map
> handling of libbpf such that when user traverses all
> obj maps, they can be differentiated from user-created
> ones via bpf_map__is_internal(). In later steps when
> we actually create these maps in the kernel via
> bpf_object__create_maps(), then for .data and .rodata
> sections their content is copied into the map through
> bpf_map_update_elem(). For .bss this is not necessary
> since array map is already zero-initialized by default.
> Additionally, for .rodata the map is frozen as read-only
> after setup, such that neither from program nor syscall
> side writes would be possible.
>
> 3) In bpf_program__collect_reloc() step, we record the
> corresponding map, insn index, and relocation type for
> the global data.
>
> 4) And last but not least in the actual relocation step in
> bpf_program__relocate(), we mark the ldimm64 instruction
> with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
> imm field the map's file descriptor is stored as similarly
> done as in BPF_PSEUDO_MAP_FD, and in the second imm field
> (as ldimm64 is 2-insn wide) we store the access offset
> into the section. Given these maps have only single element
> ldimm64's off remains zero in both parts.
>
> 5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
> load will then store the actual target address in order
> to have a 'map-lookup'-free access. That is, the actual
> map value base address + offset. The destination register
> in the verifier will then be marked as PTR_TO_MAP_VALUE,
> containing the fixed offset as reg->off and backing BPF
> map as reg->map_ptr. Meaning, it's treated as any other
> normal map value from verification side, only with
> efficient, direct value access instead of actual call to
> map lookup helper as in the typical case.
>
> Currently, only support for static global variables has been
> added, and libbpf rejects non-static global variables from
> loading. This can be lifted until we have proper semantics
> for how BPF will treat multi-object BPF loads. From BTF side,
> libbpf will set the value type id of the types corresponding
> to the ".bss", ".data" and ".rodata" names which LLVM will
> emit without the object name prefix. The key type will be
> left as zero, thus making use of the key-less BTF option in
> array maps.
>
> Simple example dump of program using globals vars in each
> section:
>
> # bpftool prog
> [...]
> 6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
> loaded_at 2019-03-11T15:39:34+0000 uid 0
> xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
>
> # bpftool map show id 2237
> 2237: array name test_glo.bss flags 0x0
> key 4B value 64B max_entries 1 memlock 4096B
> # bpftool map show id 2235
> 2235: array name test_glo.data flags 0x0
> key 4B value 64B max_entries 1 memlock 4096B
> # bpftool map show id 2236
> 2236: array name test_glo.rodata flags 0x80
> key 4B value 96B max_entries 1 memlock 4096B
>
> # bpftool prog dump xlated id 6784
> int load_static_data(struct __sk_buff * skb):
> ; int load_static_data(struct __sk_buff *skb)
> 0: (b7) r6 = 0
> ; test_reloc(number, 0, &num0);
> 1: (63) *(u32 *)(r10 -4) = r6
> 2: (bf) r2 = r10
> ; int load_static_data(struct __sk_buff *skb)
> 3: (07) r2 += -4
> ; test_reloc(number, 0, &num0);
> 4: (18) r1 = map[id:2238]
> 6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
> 8: (b7) r4 = 0
> 9: (85) call array_map_update_elem#100464
> 10: (b7) r1 = 1
> ; test_reloc(number, 1, &num1);
> [...]
> ; test_reloc(string, 2, str2);
> 120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
> 122: (18) r1 = map[id:2239]
> 124: (18) r3 = map[id:2237][0]+16
> 126: (b7) r4 = 0
> 127: (85) call array_map_update_elem#100464
> 128: (b7) r1 = 120
> ; str1[5] = 'x';
> 129: (73) *(u8 *)(r9 +5) = r1
> ; test_reloc(string, 3, str1);
> 130: (b7) r1 = 3
> 131: (63) *(u32 *)(r10 -4) = r1
> 132: (b7) r9 = 3
> 133: (bf) r2 = r10
> ; int load_static_data(struct __sk_buff *skb)
> 134: (07) r2 += -4
> ; test_reloc(string, 3, str1);
> 135: (18) r1 = map[id:2239]
> 137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
> 139: (b7) r4 = 0
> 140: (85) call array_map_update_elem#100464
> 141: (b7) r1 = 111
> ; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
> 142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
> 143: (b7) r1 = 108
> 144: (73) *(u8 *)(r8 +5) = r1
> [...]
>
> For Cilium use-case in particular, this enables migrating configuration
> constants from Cilium daemon's generated header defines into global
> data sections such that expensive runtime recompilations with LLVM can
> be avoided altogether. Instead, the ELF file becomes effectively a
> "template", meaning, it is compiled only once (!) and the Cilium daemon
> will then rewrite relevant configuration data from the ELF's .data or
> .rodata sections directly instead of recompiling the program. The
> updated ELF is then loaded into the kernel and atomically replaces
> the existing program in the networking datapath. More info in [0].
>
> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
> for static variables").
>
> [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
> http://vger.kernel.org/lpc-bpf2018.html#session-3
>
> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> Acked-by: Andrii Nakryiko <andriin@...com>
> Acked-by: Martin KaFai Lau <kafai@...com>
> ---
> tools/lib/bpf/Makefile | 2 +-
> tools/lib/bpf/bpf.c | 10 ++
> tools/lib/bpf/bpf.h | 1 +
> tools/lib/bpf/libbpf.c | 342 +++++++++++++++++++++++++++++++++------
> tools/lib/bpf/libbpf.h | 1 +
> tools/lib/bpf/libbpf.map | 6 +
> 6 files changed, 314 insertions(+), 48 deletions(-)
>
> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
> index 2a578bfc0bca..008344507700 100644
> --- a/tools/lib/bpf/Makefile
> +++ b/tools/lib/bpf/Makefile
> @@ -3,7 +3,7 @@
>
> BPF_VERSION = 0
> BPF_PATCHLEVEL = 0
> -BPF_EXTRAVERSION = 2
> +BPF_EXTRAVERSION = 3
>
> MAKEFLAGS += --no-print-directory
>
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index a1db869a6fda..c039094ad3aa 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -429,6 +429,16 @@ int bpf_map_get_next_key(int fd, const void *key, void *next_key)
> return sys_bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
> }
>
> +int bpf_map_freeze(int fd)
> +{
> + union bpf_attr attr;
> +
> + memset(&attr, 0, sizeof(attr));
> + attr.map_fd = fd;
> +
> + return sys_bpf(BPF_MAP_FREEZE, &attr, sizeof(attr));
> +}
> +
> int bpf_obj_pin(int fd, const char *pathname)
> {
> union bpf_attr attr;
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index e2c0df7b831f..c9d218d21453 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -117,6 +117,7 @@ LIBBPF_API int bpf_map_lookup_and_delete_elem(int fd, const void *key,
> void *value);
> LIBBPF_API int bpf_map_delete_elem(int fd, const void *key);
> LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
> +LIBBPF_API int bpf_map_freeze(int fd);
> LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
> LIBBPF_API int bpf_obj_get(const char *pathname);
> LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 6dba0f01673b..f7b245fbb960 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -7,6 +7,7 @@
> * Copyright (C) 2015 Wang Nan <wangnan0@...wei.com>
> * Copyright (C) 2015 Huawei Inc.
> * Copyright (C) 2017 Nicira, Inc.
> + * Copyright (C) 2019 Isovalent, Inc.
> */
>
> #ifndef _GNU_SOURCE
> @@ -149,6 +150,7 @@ struct bpf_program {
> enum {
> RELO_LD64,
> RELO_CALL,
> + RELO_DATA,
> } type;
> int insn_idx;
> union {
> @@ -182,6 +184,19 @@ struct bpf_program {
> __u32 line_info_cnt;
> };
>
> +enum libbpf_map_type {
> + LIBBPF_MAP_UNSPEC,
> + LIBBPF_MAP_DATA,
> + LIBBPF_MAP_BSS,
> + LIBBPF_MAP_RODATA,
> +};
> +
> +static const char * const libbpf_type_to_btf_name[] = {
> + [LIBBPF_MAP_DATA] = ".data",
> + [LIBBPF_MAP_BSS] = ".bss",
> + [LIBBPF_MAP_RODATA] = ".rodata",
> +};
> +
> struct bpf_map {
> int fd;
> char *name;
> @@ -193,11 +208,18 @@ struct bpf_map {
> __u32 btf_value_type_id;
> void *priv;
> bpf_map_clear_priv_t clear_priv;
> + enum libbpf_map_type libbpf_type;
> +};
> +
> +struct bpf_secdata {
> + void *rodata;
> + void *data;
> };
>
> static LIST_HEAD(bpf_objects_list);
>
> struct bpf_object {
> + char name[BPF_OBJ_NAME_LEN];
> char license[64];
> __u32 kern_version;
>
> @@ -205,6 +227,7 @@ struct bpf_object {
> size_t nr_programs;
> struct bpf_map *maps;
> size_t nr_maps;
> + struct bpf_secdata sections;
>
> bool loaded;
> bool has_pseudo_calls;
> @@ -220,6 +243,9 @@ struct bpf_object {
> Elf *elf;
> GElf_Ehdr ehdr;
> Elf_Data *symbols;
> + Elf_Data *data;
> + Elf_Data *rodata;
> + Elf_Data *bss;
> size_t strtabidx;
> struct {
> GElf_Shdr shdr;
> @@ -228,6 +254,9 @@ struct bpf_object {
> int nr_reloc;
> int maps_shndx;
> int text_shndx;
> + int data_shndx;
> + int rodata_shndx;
> + int bss_shndx;
> } efile;
> /*
> * All loaded bpf_object is linked in a list, which is
> @@ -449,6 +478,7 @@ static struct bpf_object *bpf_object__new(const char *path,
> size_t obj_buf_sz)
> {
> struct bpf_object *obj;
> + char *end;
>
> obj = calloc(1, sizeof(struct bpf_object) + strlen(path) + 1);
> if (!obj) {
> @@ -457,8 +487,14 @@ static struct bpf_object *bpf_object__new(const char *path,
> }
>
> strcpy(obj->path, path);
> - obj->efile.fd = -1;
> + /* Using basename() GNU version which doesn't modify arg. */
> + strncpy(obj->name, basename((void *)path),
> + sizeof(obj->name) - 1);
> + end = strchr(obj->name, '.');
> + if (end)
> + *end = 0;
>
> + obj->efile.fd = -1;
> /*
> * Caller of this function should also calls
> * bpf_object__elf_finish() after data collection to return
> @@ -468,6 +504,9 @@ static struct bpf_object *bpf_object__new(const char *path,
> obj->efile.obj_buf = obj_buf;
> obj->efile.obj_buf_sz = obj_buf_sz;
> obj->efile.maps_shndx = -1;
> + obj->efile.data_shndx = -1;
> + obj->efile.rodata_shndx = -1;
> + obj->efile.bss_shndx = -1;
>
> obj->loaded = false;
>
> @@ -486,6 +525,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
> obj->efile.elf = NULL;
> }
> obj->efile.symbols = NULL;
> + obj->efile.data = NULL;
> + obj->efile.rodata = NULL;
> + obj->efile.bss = NULL;
>
> zfree(&obj->efile.reloc);
> obj->efile.nr_reloc = 0;
> @@ -627,27 +669,76 @@ static bool bpf_map_type__is_map_in_map(enum bpf_map_type type)
> return false;
> }
>
> +static bool bpf_object__has_maps(const struct bpf_object *obj)
> +{
> + return obj->efile.maps_shndx >= 0 ||
> + obj->efile.data_shndx >= 0 ||
> + obj->efile.rodata_shndx >= 0 ||
> + obj->efile.bss_shndx >= 0;
> +}
> +
> +static int
> +bpf_object__init_internal_map(struct bpf_object *obj, struct bpf_map *map,
> + enum libbpf_map_type type, Elf_Data *data,
> + void **data_buff)
> +{
> + struct bpf_map_def *def = &map->def;
> + char map_name[BPF_OBJ_NAME_LEN];
> +
> + map->libbpf_type = type;
> + map->offset = ~(typeof(map->offset))0;
> + snprintf(map_name, sizeof(map_name), "%.8s%.7s", obj->name,
> + libbpf_type_to_btf_name[type]);
> + map->name = strdup(map_name);
> + if (!map->name) {
> + pr_warning("failed to alloc map name\n");
> + return -ENOMEM;
> + }
> +
> + def->type = BPF_MAP_TYPE_ARRAY;
> + def->key_size = sizeof(int);
> + def->value_size = data->d_size;
> + def->max_entries = 1;
> + def->map_flags = type == LIBBPF_MAP_RODATA ?
> + BPF_F_RDONLY_PROG : 0;
This is breaking BPF programs (even those that don't use global data,
as they still have .rodata section, though I haven't investigated its
contents) on kernels that don't yet support BPF_F_RDONLY_PROG flag
yet. We probably need to probe support for that flag first, before
using it. Just giving heads up, as I just discovered it trying to sync
libbpf on github.
> + if (data_buff) {
> + *data_buff = malloc(data->d_size);
> + if (!*data_buff) {
> + zfree(&map->name);
> + pr_warning("failed to alloc map content buffer\n");
> + return -ENOMEM;
> + }
> + memcpy(*data_buff, data->d_buf, data->d_size);
> + }
> +
> + pr_debug("map %ld is \"%s\"\n", map - obj->maps, map->name);
> + return 0;
> +}
> +
> static int
> bpf_object__init_maps(struct bpf_object *obj, int flags)
> {
> + int i, map_idx, map_def_sz, nr_syms, nr_maps = 0, nr_maps_glob = 0;
> bool strict = !(flags & MAPS_RELAX_COMPAT);
> - int i, map_idx, map_def_sz, nr_maps = 0;
> - Elf_Scn *scn;
> - Elf_Data *data = NULL;
> Elf_Data *symbols = obj->efile.symbols;
> + Elf_Data *data = NULL;
> + int ret = 0;
>
> - if (obj->efile.maps_shndx < 0)
> - return -EINVAL;
> if (!symbols)
> return -EINVAL;
> + nr_syms = symbols->d_size / sizeof(GElf_Sym);
>
> - scn = elf_getscn(obj->efile.elf, obj->efile.maps_shndx);
> - if (scn)
> - data = elf_getdata(scn, NULL);
> - if (!scn || !data) {
> - pr_warning("failed to get Elf_Data from map section %d\n",
> - obj->efile.maps_shndx);
> - return -EINVAL;
> + if (obj->efile.maps_shndx >= 0) {
> + Elf_Scn *scn = elf_getscn(obj->efile.elf,
> + obj->efile.maps_shndx);
> +
> + if (scn)
> + data = elf_getdata(scn, NULL);
> + if (!scn || !data) {
> + pr_warning("failed to get Elf_Data from map section %d\n",
> + obj->efile.maps_shndx);
> + return -EINVAL;
> + }
> }
>
> /*
> @@ -657,7 +748,13 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
> *
> * TODO: Detect array of map and report error.
> */
> - for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
> + if (obj->efile.data_shndx >= 0)
> + nr_maps_glob++;
> + if (obj->efile.rodata_shndx >= 0)
> + nr_maps_glob++;
> + if (obj->efile.bss_shndx >= 0)
> + nr_maps_glob++;
> + for (i = 0; data && i < nr_syms; i++) {
> GElf_Sym sym;
>
> if (!gelf_getsym(symbols, i, &sym))
> @@ -670,19 +767,21 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
> /* Alloc obj->maps and fill nr_maps. */
> pr_debug("maps in %s: %d maps in %zd bytes\n", obj->path,
> nr_maps, data->d_size);
> -
> - if (!nr_maps)
> + if (!nr_maps && !nr_maps_glob)
> return 0;
>
> /* Assume equally sized map definitions */
> - map_def_sz = data->d_size / nr_maps;
> - if (!data->d_size || (data->d_size % nr_maps) != 0) {
> - pr_warning("unable to determine map definition size "
> - "section %s, %d maps in %zd bytes\n",
> - obj->path, nr_maps, data->d_size);
> - return -EINVAL;
> + if (data) {
> + map_def_sz = data->d_size / nr_maps;
> + if (!data->d_size || (data->d_size % nr_maps) != 0) {
> + pr_warning("unable to determine map definition size "
> + "section %s, %d maps in %zd bytes\n",
> + obj->path, nr_maps, data->d_size);
> + return -EINVAL;
> + }
> }
>
> + nr_maps += nr_maps_glob;
> obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
> if (!obj->maps) {
> pr_warning("alloc maps for object failed\n");
> @@ -703,7 +802,7 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
> /*
> * Fill obj->maps using data in "maps" section.
> */
> - for (i = 0, map_idx = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
> + for (i = 0, map_idx = 0; data && i < nr_syms; i++) {
> GElf_Sym sym;
> const char *map_name;
> struct bpf_map_def *def;
> @@ -716,6 +815,8 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
> map_name = elf_strptr(obj->efile.elf,
> obj->efile.strtabidx,
> sym.st_name);
> +
> + obj->maps[map_idx].libbpf_type = LIBBPF_MAP_UNSPEC;
> obj->maps[map_idx].offset = sym.st_value;
> if (sym.st_value + map_def_sz > data->d_size) {
> pr_warning("corrupted maps section in %s: last map \"%s\" too small\n",
> @@ -764,8 +865,27 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
> map_idx++;
> }
>
> - qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]), compare_bpf_map);
> - return 0;
> + /*
> + * Populate rest of obj->maps with libbpf internal maps.
> + */
> + if (obj->efile.data_shndx >= 0)
> + ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
> + LIBBPF_MAP_DATA,
> + obj->efile.data,
> + &obj->sections.data);
> + if (!ret && obj->efile.rodata_shndx >= 0)
> + ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
> + LIBBPF_MAP_RODATA,
> + obj->efile.rodata,
> + &obj->sections.rodata);
> + if (!ret && obj->efile.bss_shndx >= 0)
> + ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
> + LIBBPF_MAP_BSS,
> + obj->efile.bss, NULL);
> + if (!ret)
> + qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]),
> + compare_bpf_map);
> + return ret;
> }
>
> static bool section_have_execinstr(struct bpf_object *obj, int idx)
> @@ -885,6 +1005,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
> pr_warning("failed to alloc program %s (%s): %s",
> name, obj->path, cp);
> }
> + } else if (strcmp(name, ".data") == 0) {
> + obj->efile.data = data;
> + obj->efile.data_shndx = idx;
> + } else if (strcmp(name, ".rodata") == 0) {
> + obj->efile.rodata = data;
> + obj->efile.rodata_shndx = idx;
> + } else {
> + pr_debug("skip section(%d) %s\n", idx, name);
> }
> } else if (sh.sh_type == SHT_REL) {
> void *reloc = obj->efile.reloc;
> @@ -912,6 +1040,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
> obj->efile.reloc[n].shdr = sh;
> obj->efile.reloc[n].data = data;
> }
> + } else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
> + obj->efile.bss = data;
> + obj->efile.bss_shndx = idx;
> } else {
> pr_debug("skip section(%d) %s\n", idx, name);
> }
> @@ -938,7 +1069,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
> }
> }
> }
> - if (obj->efile.maps_shndx >= 0) {
> + if (bpf_object__has_maps(obj)) {
> err = bpf_object__init_maps(obj, flags);
> if (err)
> goto out;
> @@ -974,13 +1105,46 @@ bpf_object__find_program_by_title(struct bpf_object *obj, const char *title)
> return NULL;
> }
>
> +static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
> + int shndx)
> +{
> + return shndx == obj->efile.data_shndx ||
> + shndx == obj->efile.bss_shndx ||
> + shndx == obj->efile.rodata_shndx;
> +}
> +
> +static bool bpf_object__shndx_is_maps(const struct bpf_object *obj,
> + int shndx)
> +{
> + return shndx == obj->efile.maps_shndx;
> +}
> +
> +static bool bpf_object__relo_in_known_section(const struct bpf_object *obj,
> + int shndx)
> +{
> + return shndx == obj->efile.text_shndx ||
> + bpf_object__shndx_is_maps(obj, shndx) ||
> + bpf_object__shndx_is_data(obj, shndx);
> +}
> +
> +static enum libbpf_map_type
> +bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
> +{
> + if (shndx == obj->efile.data_shndx)
> + return LIBBPF_MAP_DATA;
> + else if (shndx == obj->efile.bss_shndx)
> + return LIBBPF_MAP_BSS;
> + else if (shndx == obj->efile.rodata_shndx)
> + return LIBBPF_MAP_RODATA;
> + else
> + return LIBBPF_MAP_UNSPEC;
> +}
> +
> static int
> bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> Elf_Data *data, struct bpf_object *obj)
> {
> Elf_Data *symbols = obj->efile.symbols;
> - int text_shndx = obj->efile.text_shndx;
> - int maps_shndx = obj->efile.maps_shndx;
> struct bpf_map *maps = obj->maps;
> size_t nr_maps = obj->nr_maps;
> int i, nrels;
> @@ -1000,7 +1164,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> GElf_Sym sym;
> GElf_Rel rel;
> unsigned int insn_idx;
> + unsigned int shdr_idx;
> struct bpf_insn *insns = prog->insns;
> + enum libbpf_map_type type;
> + const char *name;
> size_t map_idx;
>
> if (!gelf_getrel(data, i, &rel)) {
> @@ -1015,13 +1182,18 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> GELF_R_SYM(rel.r_info));
> return -LIBBPF_ERRNO__FORMAT;
> }
> - pr_debug("relo for %lld value %lld name %d\n",
> +
> + name = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
> + sym.st_name) ? : "<?>";
> +
> + pr_debug("relo for %lld value %lld name %d (\'%s\')\n",
> (long long) (rel.r_info >> 32),
> - (long long) sym.st_value, sym.st_name);
> + (long long) sym.st_value, sym.st_name, name);
>
> - if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> - pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> - prog->section_name, sym.st_shndx);
> + shdr_idx = sym.st_shndx;
> + if (!bpf_object__relo_in_known_section(obj, shdr_idx)) {
> + pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
> + prog->section_name, shdr_idx);
> return -LIBBPF_ERRNO__RELOC;
> }
>
> @@ -1046,10 +1218,22 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> return -LIBBPF_ERRNO__RELOC;
> }
>
> - if (sym.st_shndx == maps_shndx) {
> - /* TODO: 'maps' is sorted. We can use bsearch to make it faster. */
> + if (bpf_object__shndx_is_maps(obj, shdr_idx) ||
> + bpf_object__shndx_is_data(obj, shdr_idx)) {
> + type = bpf_object__section_to_libbpf_map_type(obj, shdr_idx);
> + if (type != LIBBPF_MAP_UNSPEC &&
> + GELF_ST_BIND(sym.st_info) == STB_GLOBAL) {
> + pr_warning("bpf: relocation: not yet supported relo for non-static global \'%s\' variable found in insns[%d].code 0x%x\n",
> + name, insn_idx, insns[insn_idx].code);
> + return -LIBBPF_ERRNO__RELOC;
> + }
> +
> for (map_idx = 0; map_idx < nr_maps; map_idx++) {
> - if (maps[map_idx].offset == sym.st_value) {
> + if (maps[map_idx].libbpf_type != type)
> + continue;
> + if (type != LIBBPF_MAP_UNSPEC ||
> + (type == LIBBPF_MAP_UNSPEC &&
> + maps[map_idx].offset == sym.st_value)) {
> pr_debug("relocation: find map %zd (%s) for insn %u\n",
> map_idx, maps[map_idx].name, insn_idx);
> break;
> @@ -1062,7 +1246,8 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> return -LIBBPF_ERRNO__RELOC;
> }
>
> - prog->reloc_desc[i].type = RELO_LD64;
> + prog->reloc_desc[i].type = type != LIBBPF_MAP_UNSPEC ?
> + RELO_DATA : RELO_LD64;
> prog->reloc_desc[i].insn_idx = insn_idx;
> prog->reloc_desc[i].map_idx = map_idx;
> }
> @@ -1073,18 +1258,27 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
> {
> struct bpf_map_def *def = &map->def;
> - __u32 key_type_id, value_type_id;
> + __u32 key_type_id = 0, value_type_id = 0;
> int ret;
>
> - ret = btf__get_map_kv_tids(btf, map->name, def->key_size,
> - def->value_size, &key_type_id,
> - &value_type_id);
> - if (ret)
> + if (!bpf_map__is_internal(map)) {
> + ret = btf__get_map_kv_tids(btf, map->name, def->key_size,
> + def->value_size, &key_type_id,
> + &value_type_id);
> + } else {
> + /*
> + * LLVM annotates global data differently in BTF, that is,
> + * only as '.data', '.bss' or '.rodata'.
> + */
> + ret = btf__find_by_name(btf,
> + libbpf_type_to_btf_name[map->libbpf_type]);
> + }
> + if (ret < 0)
> return ret;
>
> map->btf_key_type_id = key_type_id;
> - map->btf_value_type_id = value_type_id;
> -
> + map->btf_value_type_id = bpf_map__is_internal(map) ?
> + ret : value_type_id;
> return 0;
> }
>
> @@ -1195,6 +1389,34 @@ bpf_object__probe_caps(struct bpf_object *obj)
> return bpf_object__probe_name(obj);
> }
>
> +static int
> +bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
> +{
> + char *cp, errmsg[STRERR_BUFSIZE];
> + int err, zero = 0;
> + __u8 *data;
> +
> + /* Nothing to do here since kernel already zero-initializes .bss map. */
> + if (map->libbpf_type == LIBBPF_MAP_BSS)
> + return 0;
> +
> + data = map->libbpf_type == LIBBPF_MAP_DATA ?
> + obj->sections.data : obj->sections.rodata;
> +
> + err = bpf_map_update_elem(map->fd, &zero, data, 0);
> + /* Freeze .rodata map as read-only from syscall side. */
> + if (!err && map->libbpf_type == LIBBPF_MAP_RODATA) {
> + err = bpf_map_freeze(map->fd);
> + if (err) {
> + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> + pr_warning("Error freezing map(%s) as read-only: %s\n",
> + map->name, cp);
> + err = 0;
> + }
> + }
> + return err;
> +}
> +
> static int
> bpf_object__create_maps(struct bpf_object *obj)
> {
> @@ -1252,6 +1474,7 @@ bpf_object__create_maps(struct bpf_object *obj)
> size_t j;
>
> err = *pfd;
> +err_out:
> cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> pr_warning("failed to create map (name: '%s'): %s\n",
> map->name, cp);
> @@ -1259,6 +1482,15 @@ bpf_object__create_maps(struct bpf_object *obj)
> zclose(obj->maps[j].fd);
> return err;
> }
> +
> + if (bpf_map__is_internal(map)) {
> + err = bpf_object__populate_internal_map(obj, map);
> + if (err < 0) {
> + zclose(*pfd);
> + goto err_out;
> + }
> + }
> +
> pr_debug("create map %s: fd=%d\n", map->name, *pfd);
> }
>
> @@ -1413,19 +1645,27 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
> return 0;
>
> for (i = 0; i < prog->nr_reloc; i++) {
> - if (prog->reloc_desc[i].type == RELO_LD64) {
> + if (prog->reloc_desc[i].type == RELO_LD64 ||
> + prog->reloc_desc[i].type == RELO_DATA) {
> + bool relo_data = prog->reloc_desc[i].type == RELO_DATA;
> struct bpf_insn *insns = prog->insns;
> int insn_idx, map_idx;
>
> insn_idx = prog->reloc_desc[i].insn_idx;
> map_idx = prog->reloc_desc[i].map_idx;
>
> - if (insn_idx >= (int)prog->insns_cnt) {
> + if (insn_idx + 1 >= (int)prog->insns_cnt) {
> pr_warning("relocation out of range: '%s'\n",
> prog->section_name);
> return -LIBBPF_ERRNO__RELOC;
> }
> - insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
> +
> + if (!relo_data) {
> + insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
> + } else {
> + insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> + insns[insn_idx + 1].imm = insns[insn_idx].imm;
> + }
> insns[insn_idx].imm = obj->maps[map_idx].fd;
> } else if (prog->reloc_desc[i].type == RELO_CALL) {
> err = bpf_program__reloc_text(prog, obj,
> @@ -2321,6 +2561,9 @@ void bpf_object__close(struct bpf_object *obj)
> obj->maps[i].priv = NULL;
> obj->maps[i].clear_priv = NULL;
> }
> +
> + zfree(&obj->sections.rodata);
> + zfree(&obj->sections.data);
> zfree(&obj->maps);
> obj->nr_maps = 0;
>
> @@ -2798,6 +3041,11 @@ bool bpf_map__is_offload_neutral(struct bpf_map *map)
> return map->def.type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
> }
>
> +bool bpf_map__is_internal(struct bpf_map *map)
> +{
> + return map->libbpf_type != LIBBPF_MAP_UNSPEC;
> +}
> +
> void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex)
> {
> map->map_ifindex = ifindex;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 531323391d07..12db2822c8e7 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -301,6 +301,7 @@ LIBBPF_API void *bpf_map__priv(struct bpf_map *map);
> LIBBPF_API int bpf_map__reuse_fd(struct bpf_map *map, int fd);
> LIBBPF_API int bpf_map__resize(struct bpf_map *map, __u32 max_entries);
> LIBBPF_API bool bpf_map__is_offload_neutral(struct bpf_map *map);
> +LIBBPF_API bool bpf_map__is_internal(struct bpf_map *map);
> LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
> LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
> LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index f3ce50500cf2..be42bdffc8de 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -157,3 +157,9 @@ LIBBPF_0.0.2 {
> bpf_program__bpil_addr_to_offs;
> bpf_program__bpil_offs_to_addr;
> } LIBBPF_0.0.1;
> +
> +LIBBPF_0.0.3 {
> + global:
> + bpf_map__is_internal;
> + bpf_map_freeze;
> +} LIBBPF_0.0.2;
> --
> 2.17.1
>
Powered by blists - more mailing lists