[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzZWZmTFj0Xqf+_1QoT5E6tt7ijmzk39khgZ2zYfe0QFUw@mail.gmail.com>
Date: Fri, 1 Mar 2019 11:10:28 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Yonghong Song <yhs@...com>
Cc: Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...com>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"joe@...d.net.nz" <joe@...d.net.nz>,
"john.fastabend@...il.com" <john.fastabend@...il.com>,
"tgraf@...g.ch" <tgraf@...g.ch>, Andrii Nakryiko <andriin@...com>,
"jakub.kicinski@...ronome.com" <jakub.kicinski@...ronome.com>,
"lmb@...udflare.com" <lmb@...udflare.com>
Subject: Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global
data/bss/rodata sections
On Fri, Mar 1, 2019 at 10:58 AM Yonghong Song <yhs@...com> wrote:
>
>
>
> On 3/1/19 10:48 AM, Andrii Nakryiko wrote:
> > On Fri, Mar 1, 2019 at 10:31 AM Yonghong Song <yhs@...com> wrote:
> >>
> >>
> >>
> >> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
> >>> This work adds BPF loader support for global data sections
> >>> to libbpf. This allows to write BPF programs in more natural
> >>> C-like way by being able to define global variables and const
> >>> data.
> >>>
> >>> Back at LPC 2018 [0] we presented a first prototype which
> >>> implemented support for global data sections by extending BPF
> >>> syscall where union bpf_attr would get additional memory/size
> >>> pair for each section passed during prog load in order to later
> >>> add this base address into the ldimm64 instruction along with
> >>> the user provided offset when accessing a variable. Consensus
> >>> from LPC was that for proper upstream support, it would be
> >>> more desirable to use maps instead of bpf_attr extension as
> >>> this would allow for introspection of these sections as well
> >>> as potential life updates of their content. This work follows
> >>> this path by taking the following steps from loader side:
> >>>
> >>> 1) In bpf_object__elf_collect() step we pick up ".data",
> >>> ".rodata", and ".bss" section information.
> >>>
> >>> 2) If present, in bpf_object__init_global_maps() we create
> >>> a map that corresponds to each of the present sections.
> >>> Given section size and access properties can differ, a
> >>> single entry array map is created with value size that
> >>> is corresponding to the ELF section size of .data, .bss
> >>> or .rodata. In the latter case, the map is created as
> >>> read-only from program side such that verifier rejects
> >>> any write attempts into .rodata. In a subsequent step,
> >>> for .data and .rodata sections, the section content is
> >>> copied into the map through bpf_map_update_elem(). For
> >>> .bss this is not necessary since array map is already
> >>> zero-initialized by default.
> >>>
> >>> 3) In bpf_program__collect_reloc() step, we record the
> >>> corresponding map, insn index, and relocation type for
> >>> the global data.
> >>>
> >>> 4) And last but not least in the actual relocation step in
> >>> bpf_program__relocate(), we mark the ldimm64 instruction
> >>> with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
> >>> imm field the map's file descriptor is stored as similarly
> >>> done as in BPF_PSEUDO_MAP_FD, and in the second imm field
> >>> (as ldimm64 is 2-insn wide) we store the access offset
> >>> into the section.
> >>>
> >>> 5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
> >>> load will then store the actual target address in order
> >>> to have a 'map-lookup'-free access. That is, the actual
> >>> map value base address + offset. The destination register
> >>> in the verifier will then be marked as PTR_TO_MAP_VALUE,
> >>> containing the fixed offset as reg->off and backing BPF
> >>> map as reg->map_ptr. Meaning, it's treated as any other
> >>> normal map value from verification side, only with
> >>> efficient, direct value access instead of actual call to
> >>> map lookup helper as in the typical case.
> >>>
> >>> Simple example dump of program using globals vars in each
> >>> section:
> >>>
> >>> # readelf -a test_global_data.o
> >>> [...]
> >>> [ 6] .bss NOBITS 0000000000000000 00000328
> >>> 0000000000000010 0000000000000000 WA 0 0 8
> >>> [ 7] .data PROGBITS 0000000000000000 00000328
> >>> 0000000000000010 0000000000000000 WA 0 0 8
> >>> [ 8] .rodata PROGBITS 0000000000000000 00000338
> >>> 0000000000000018 0000000000000000 A 0 0 8
> >>> [...]
> >>> 95: 0000000000000000 8 OBJECT LOCAL DEFAULT 6 static_bss
> >>> 96: 0000000000000008 8 OBJECT LOCAL DEFAULT 6 static_bss2
> >>> 97: 0000000000000000 8 OBJECT LOCAL DEFAULT 7 static_data
> >>> 98: 0000000000000008 8 OBJECT LOCAL DEFAULT 7 static_data2
> >>> 99: 0000000000000000 8 OBJECT LOCAL DEFAULT 8 static_rodata
> >>> 100: 0000000000000008 8 OBJECT LOCAL DEFAULT 8 static_rodata2
> >>> 101: 0000000000000010 8 OBJECT LOCAL DEFAULT 8 static_rodata3
> >>> [...]
> >>>
> >>> # bpftool prog
> >>> 103: sched_cls name load_static_dat tag 37a8b6822fc39a29 gpl
> >>> loaded_at 2019-02-28T02:02:35+0000 uid 0
> >>> xlated 712B jited 426B memlock 4096B map_ids 63,64,65,66
> >>> # bpftool map show id 63
> >>> 63: array name .bss flags 0x0 <-- .bss area, rw
> >>> key 4B value 16B max_entries 1 memlock 4096B
> >>> # bpftool map show id 64
> >>> 64: array name .data flags 0x0 <-- .data area, rw
> >>> key 4B value 16B max_entries 1 memlock 4096B
> >>> # bpftool map show id 65
> >>> 65: array name .rodata flags 0x80 <-- .rodata area, ro
> >>> key 4B value 24B max_entries 1 memlock 4096B
> >>>
> >>> # bpftool prog dump xlated id 103
> >>> int load_static_data(struct __sk_buff * skb):
> >>> ; int load_static_data(struct __sk_buff *skb)
> >>> 0: (b7) r1 = 0
> >>> ; key = 0;
> >>> 1: (63) *(u32 *)(r10 -4) = r1
> >>> 2: (bf) r6 = r10
> >>> ; int load_static_data(struct __sk_buff *skb)
> >>> 3: (07) r6 += -4
> >>> ; bpf_map_update_elem(&result, &key, &static_bss, 0);
> >>> 4: (18) r1 = map[id:66]
> >>> 6: (bf) r2 = r6
> >>> 7: (18) r3 = map[id:63][0]+0 <-- direct static_bss addr in .bss area
> >>> 9: (b7) r4 = 0
> >>> 10: (85) call array_map_update_elem#99888
> >>> 11: (b7) r1 = 1
> >>> ; key = 1;
> >>> 12: (63) *(u32 *)(r10 -4) = r1
> >>> ; bpf_map_update_elem(&result, &key, &static_data, 0);
> >>> 13: (18) r1 = map[id:66]
> >>> 15: (bf) r2 = r6
> >>> 16: (18) r3 = map[id:64][0]+0 <-- direct static_data addr in .data area
> >>> 18: (b7) r4 = 0
> >>> 19: (85) call array_map_update_elem#99888
> >>> 20: (b7) r1 = 2
> >>> ; key = 2;
> >>> 21: (63) *(u32 *)(r10 -4) = r1
> >>> ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
> >>> 22: (18) r1 = map[id:66]
> >>> 24: (bf) r2 = r6
> >>> 25: (18) r3 = map[id:65][0]+0 <-- direct static_rodata addr in .rodata area
> >>> 27: (b7) r4 = 0
> >>> 28: (85) call array_map_update_elem#99888
> >>> 29: (b7) r1 = 3
> >>> ; key = 3;
> >>> 30: (63) *(u32 *)(r10 -4) = r1
> >>> ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
> >>> 31: (18) r7 = map[id:63][0]+8 <--.
> >>> 33: (18) r1 = map[id:66] |
> >>> 35: (bf) r2 = r6 |
> >>> 36: (18) r3 = map[id:63][0]+8 <-- direct static_bss2 addr in .bss area
> >>> 38: (b7) r4 = 0
> >>> 39: (85) call array_map_update_elem#99888
> >>> [...]
> >>>
> >>> For now .data/.rodata/.bss maps are not exposed via API to the
> >>> user, but this could be done in a subsequent step.
> >>>
> >>> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> >>> fail for static variables").
> >>>
> >>> Joint work with Joe Stringer.
> >>>
> >>> [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
> >>> http://vger.kernel.org/lpc-bpf2018.html#session-3
> >>>
> >>> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> >>> Signed-off-by: Joe Stringer <joe@...d.net.nz>
> >>> ---
> >>> tools/include/uapi/linux/bpf.h | 10 +-
> >>> tools/lib/bpf/libbpf.c | 259 +++++++++++++++++++++++++++------
> >>> 2 files changed, 226 insertions(+), 43 deletions(-)
> >>>
> >>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >>> index 8884072e1a46..04b26f59b413 100644
> >>> --- a/tools/include/uapi/linux/bpf.h
> >>> +++ b/tools/include/uapi/linux/bpf.h
> >>> @@ -287,7 +287,7 @@ enum bpf_attach_type {
> >>> [...]
> >>> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>> (long long) (rel.r_info >> 32),
> >>> (long long) sym.st_value, sym.st_name);
> >>>
> >>> - if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> >>> - pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> >>> + if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> >>> + sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
> >>> + sym.st_shndx != bss_shndx) {
> >>> + pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
> >>> prog->section_name, sym.st_shndx);
> >>> return -LIBBPF_ERRNO__RELOC;
> >>> }
> >>> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>> prog->reloc_desc[i].type = RELO_LD64;
> >>> prog->reloc_desc[i].insn_idx = insn_idx;
> >>> prog->reloc_desc[i].map_idx = map_idx;
> >>> + } else if (sym.st_shndx == data_shndx ||
> >>> + sym.st_shndx == rodata_shndx ||
> >>> + sym.st_shndx == bss_shndx) {
> >>> + int type = (sym.st_shndx == data_shndx) ? RELO_DATA :
> >>> + (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
> >>> + RELO_BSS;
> >>> +
> >>> + for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
> >>> + if (maps_global[map_idx].global_type == type) {
> >>> + pr_debug("relocation: find map %zd (%s) for insn %u\n",
> >>> + map_idx, maps_global[map_idx].name, insn_idx);
> >>> + break;
> >>> + }
> >>> + }
> >>> +
> >>> + if (map_idx >= nr_maps_global) {
> >>> + pr_warning("bpf relocation: map_idx %d large than %d\n",
> >>> + (int)map_idx, (int)nr_maps_global - 1);
> >>> + return -LIBBPF_ERRNO__RELOC;
> >>> + }
> >>> +
> >>> + prog->reloc_desc[i].type = type;
> >>> + prog->reloc_desc[i].insn_idx = insn_idx;
> >>> + prog->reloc_desc[i].map_idx = map_idx;
> >>> }
> >>> }
> >>> return 0;
> >>> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
> >>> }
> >>>
> >>> static int
> >> [...]
> >>> +
> >>> +static int
> >>> +bpf_object__create_maps(struct bpf_object *obj)
> >>> +{
> >>> unsigned int i;
> >>> int err;
> >>>
> >>> for (i = 0; i < obj->nr_maps; i++) {
> >>> struct bpf_map *map = &obj->maps[i];
> >>> - struct bpf_map_def *def = &map->def;
> >>> char *cp, errmsg[STRERR_BUFSIZE];
> >>> int *pfd = &map->fd;
> >>>
> >>> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
> >>> map->name, map->fd);
> >>> continue;
> >>> }
> >>> -
> >>> - if (obj->caps.name)
> >>> - create_attr.name = map->name;
> >>> - create_attr.map_ifindex = map->map_ifindex;
> >>> - create_attr.map_type = def->type;
> >>> - create_attr.map_flags = def->map_flags;
> >>> - create_attr.key_size = def->key_size;
> >>> - create_attr.value_size = def->value_size;
> >>> - create_attr.max_entries = def->max_entries;
> >>> - create_attr.btf_fd = 0;
> >>> - create_attr.btf_key_type_id = 0;
> >>> - create_attr.btf_value_type_id = 0;
> >>> - if (bpf_map_type__is_map_in_map(def->type) &&
> >>> - map->inner_map_fd >= 0)
> >>> - create_attr.inner_map_fd = map->inner_map_fd;
> >>> -
> >>> - if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> >>> - create_attr.btf_fd = btf__fd(obj->btf);
> >>> - create_attr.btf_key_type_id = map->btf_key_type_id;
> >>> - create_attr.btf_value_type_id = map->btf_value_type_id;
> >>> - }
> >>> -
> >>> - *pfd = bpf_create_map_xattr(&create_attr);
> >>> - if (*pfd < 0 && create_attr.btf_key_type_id) {
> >>> - cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >>> - pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> >>> - map->name, cp, errno);
> >>> - create_attr.btf_fd = 0;
> >>> - create_attr.btf_key_type_id = 0;
> >>> - create_attr.btf_value_type_id = 0;
> >>> - map->btf_key_type_id = 0;
> >>> - map->btf_value_type_id = 0;
> >>> - *pfd = bpf_create_map_xattr(&create_attr);
> >>> - }
> >>> -
> >>> + *pfd = bpf_object__create_map(obj, map);
> >>> if (*pfd < 0) {
> >>> size_t j;
> >>>
> >>> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
> >>> &prog->reloc_desc[i]);
> >>> if (err)
> >>> return err;
> >>> + } else if (prog->reloc_desc[i].type == RELO_DATA ||
> >>> + prog->reloc_desc[i].type == RELO_RODATA ||
> >>> + prog->reloc_desc[i].type == RELO_BSS) {
> >>> + struct bpf_insn *insns = prog->insns;
> >>> + int insn_idx, map_idx, data_off;
> >>> +
> >>> + insn_idx = prog->reloc_desc[i].insn_idx;
> >>> + map_idx = prog->reloc_desc[i].map_idx;
> >>> + data_off = insns[insn_idx].imm;
> >>
> >> I want to point to a subtle difference here between handling pure global
> >> variables and static global variables. The "imm" value is only available
> >> for static variables. For example,
> >>
> >> -bash-4.4$ cat g.c
> >> static volatile long sg = 2;
> >> static volatile int si = 3;
> >> long g = 4;
> >> int i = 5;
> >> int test() { return sg + si + g + i; }
> >> -bash-4.4$
> >> -bash-4.4$ clang -target bpf -O2 -c g.c
> >>
> >> -bash-4.4$ readelf -s g.o
> >>
> >>
> >> Symbol table '.symtab' contains 8 entries:
> >> Num: Value Size Type Bind Vis Ndx Name
> >> 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
> >> 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS g.c
> >> 2: 0000000000000010 8 OBJECT LOCAL DEFAULT 4 sg
> >> 3: 0000000000000018 4 OBJECT LOCAL DEFAULT 4 si
> >> 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
> >> 5: 0000000000000000 8 OBJECT GLOBAL DEFAULT 4 g
> >> 6: 0000000000000008 4 OBJECT GLOBAL DEFAULT 4 i
> >> 7: 0000000000000000 128 FUNC GLOBAL DEFAULT 2 test
> >> -bash-4.4$
> >> -bash-4.4$ llvm-readelf -r g.o
> >>
> >> Relocation section '.rel.text' at offset 0x1d8 contains 4 entries:
> >> Offset Info Type Symbol's
> >> Value Symbol's Name
> >> 0000000000000000 0000000400000001 R_BPF_64_64
> >> 0000000000000000 .data
> >> 0000000000000018 0000000400000001 R_BPF_64_64
> >> 0000000000000000 .data
> >> 0000000000000038 0000000500000001 R_BPF_64_64 0000000000000000 g
> >> 0000000000000058 0000000600000001 R_BPF_64_64 0000000000000008 i
> >> -bash-4.4$ llvm-objdump -d g.o
> >>
> >> g.o: file format ELF64-BPF
> >>
> >> Disassembly of section .text:
> >> 0000000000000000 test:
> >> 0: 18 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00
> >> r1 = 16 ll
> >> 2: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0)
> >> 3: 18 02 00 00 18 00 00 00 00 00 00 00 00 00 00 00
> >> r2 = 24 ll
> >> 5: 61 22 00 00 00 00 00 00 r2 = *(u32 *)(r2 + 0)
> >> 6: 0f 21 00 00 00 00 00 00 r1 += r2
> >> 7: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> r2 = 0 ll
> >> 9: 79 22 00 00 00 00 00 00 r2 = *(u64 *)(r2 + 0)
> >> 10: 0f 21 00 00 00 00 00 00 r1 += r2
> >> 11: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> r2 = 0 ll
> >> 13: 61 20 00 00 00 00 00 00 r0 = *(u32 *)(r2 + 0)
> >> 14: 0f 10 00 00 00 00 00 00 r0 += r1
> >> 15: 95 00 00 00 00 00 00 00 exit
> >> -bash-4.4$
> >>
> >> You can see the above, the non-static global access does not have its
> >> in-section offset encoded in the insn itself. The difference is due to
> >> llvm treating static global and non-static global differently.
> >>
> >> To support both cases, during relocation recording stage, you can
> >> also record:
> >> . symbol binding (GELF_ST_BIND(sym.st_info)),
> >> non-static global has binding STB_GLOBAL and static
> >> global has binding STB_LOCAL
> >> . symbol value (sym.st_value)
> >>
> >> During the above relocation resolution, if symbol bind is local, do
> >> what you already did here. If symbol bind is global, assign data_off
> >> with symbol value.
> >>
> >> This applied to both .data and .rodata sections.
> >>
> >> The non initialized
> >> global variable will not be in any allocated section in ELF file,
> >> it is in a COM section which is to be allocated by loader.
> >> So user defines some like
> >> int g;
> >> and later on uses it. Right now, it will not work. The workaround
> >> is "int g = 4", or "static int g". I guess it should be
> >> okay, we should encourage users to use "static" variables instead.
> >
> > Would it be reasonable to just plain disable usage of uninitialized
> > global variables, as it kind of goes against BPF's philosophy that
> > everything should be written to, before can be read? So while we can
> > just implicitly zero-out everything beforehand, it might be a good
> > idea to remind and enforce that explictly?
>
> There will be a verifier error, so the program with "int g" will not
> run, the same as today.
Yeah, I understand, but with pretty obscure error about not supporting
relocations and stuff, right?
>
> We could improve by flagging the error at compiler error or libbpf time.
So that's my point, that having compiler emit nicer error for
target=bpf would be nice touch to user experience :)
> But it is not required. I am mentioning just for completeness.
>
> >
> >>
> >>> +
> >>> + if (insn_idx + 1 >= (int)prog->insns_cnt) {
> >>> + pr_warning("relocation out of range: '%s'\n",
> >>> + prog->section_name);
> >>> + return -LIBBPF_ERRNO__RELOC;
> >>> + }
> >>> + insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> >>> + insns[insn_idx].imm = obj->maps_global[map_idx].fd;
> >>> + insns[insn_idx + 1].imm = data_off;
> >>> }
> >>> }
> >>>
> >>> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
> >>>
> >>> CHECK_ERR(bpf_object__elf_init(obj), err, out);
> >>> CHECK_ERR(bpf_object__check_endianness(obj), err, out);
> >>> + CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >>> CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
> >>> CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
> >>> CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
> >>> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
> >>>
> >>> for (i = 0; i < obj->nr_maps; i++)
> >>> zclose(obj->maps[i].fd);
> >>> -
> >>> + for (i = 0; i < obj->nr_maps_global; i++)
> >>> + zclose(obj->maps_global[i].fd);
> >>> for (i = 0; i < obj->nr_programs; i++)
> >>> bpf_program__unload(&obj->programs[i]);
> >>>
> >>> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
> >>>
> >>> obj->loaded = true;
> >>>
> >>> - CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >>> CHECK_ERR(bpf_object__create_maps(obj), err, out);
> >>> CHECK_ERR(bpf_object__relocate(obj), err, out);
> >>> CHECK_ERR(bpf_object__load_progs(obj), err, out);
> >>>
Powered by blists - more mailing lists