[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f6760fc2-9cf9-eadb-eba9-c7af953a703f@iogearbox.net>
Date: Tue, 5 Mar 2019 10:31:24 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Yonghong Song <yhs@...com>,
Andrii Nakryiko <andrii.nakryiko@...il.com>,
Alexei Starovoitov <ast@...com>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"joe@...d.net.nz" <joe@...d.net.nz>,
"john.fastabend@...il.com" <john.fastabend@...il.com>,
"tgraf@...g.ch" <tgraf@...g.ch>, Andrii Nakryiko <andriin@...com>,
"jakub.kicinski@...ronome.com" <jakub.kicinski@...ronome.com>,
"lmb@...udflare.com" <lmb@...udflare.com>
Subject: Re: static bpf vars. Was: [PATCH bpf-next v2 5/7] bpf, libbpf:
support global data/bss/rodata sections
On 03/05/2019 03:28 AM, Alexei Starovoitov wrote:
> On Fri, Mar 01, 2019 at 09:06:35PM +0100, Daniel Borkmann wrote:
>
> Overall I think the patches and direction is great.
> Thanks a lot for working on it.
> More thoughts below:
Okay, thanks!
>> By the way, from LLVM side, do you think it makes sense for local vars
>> where you encode the offset into insn->imm to already encode it into
>> (insn+1)->imm of the ldimm64, so that loaders can just pass this offset
>> through instead of fixing it up like I did? I'm fine either way though,
>> just thought might be worth pointing out while we're at it. :)
>
> 1.
> typical ldimm64 carries 64-bit value. upper 32-bits are in insn[1]->imm.
> I think it's better to stick to this meaning of imm for encoding
> the section offset and keep it in insn[0]->imm.
> Especially since from disassembler pov it looks like normal ldimm64 with relocation.
Hadn't thought about it from this angle, but yes, from debugging /disasm
point of view it makes perfect sense to keep it as-is.
> Later libbpf does PSEUDO trick and can move [0]->imm to [1]->imm.
> Or it can use [1]->imm to store FD to pass to kernel?
My preference would rather be the former as it's consistent with how
we do it for normal map fds as well, so I'd keep loader side having it
moved as it currently is; I think it's fine.
> Another alternative is to use 16-bit of 'off' to encode the offset,
> but the limit of 32kbyte of static variables is probably too small.
Yes agree, and I've also reused both off fields now for optionally
allowing to select an index into the array based on Andrii's feedback.
> 2.
> I think it's better to only allow 'static int foo' in libbpf for now.
> In C 'global' variables suppose to be shareable across multiple elf files.
> libbpf doesn't have support for it yet. We need to start discussing this multi elf
> support which will lead into discussion about bpf libraries.
> I think we need a bit of freedom to decide the meaning of 'global' modifier.
> So for now I would support only 'static int foo' in libbpf.
> llvm supports both and that's ok, since that's a standard elf markings.
Makes sense, we first need to have a better understanding on how semantics
should look there from BPF pov. I'll restrict libbpf for static vars for
the time being, and throw an error message otherwise.
There is one more semantical question: a BPF ELF file can have multiple
program entry points in it. Both programs from there could refer to static
global variables inside that object file. My thinking which this patch set
also implemented is to have both of them refer to the same .bss/.data/.rodata
map (and _not_ separate ones) as I think this would also naturally be what
the user is expecting when referring to the same global vars in the given
object file.
> Also were these patches tested with 'static struct {int a; int b;} foo;' and
> 'static char foo[]="my string";' ?
> I suspect it should 'just work' with proposed design, but would be good to have a test.
+1, I'll add more tests. Iirc, I tested it back with the original prototype
from LPC which worked same way from high level point of view, but good to
have it explicitly in BPF selftests, agree, will add.
> 3.
> let's come up with concrete plan for BTF for these static variables.
> With normal maps the BTF was retrofitted via BPF_ANNOTATE_KV_PAIR macro
> and no one is proud of this interface choice. We had 'struct bpf_map_def'
> and the macro was the least evil.
> For static variables let's figure out how to integrate BTF into it cleanly.
> LLVM already knows about the types. It needs another kind to describe
> the variable names and offsets. I think what we did with BTF_KIND_FUNC
> should be applicable here. Last time we discussed that we will add BTF_KIND_VAR
> for this purpose. Shouldn't be too much work to hack it into llvm now?
> I would even go a step further and make BTF to be mandatory for static vars.
> Introspection of BPF programs will go long way.
We can make that mandatory, nice thing would be that such BPF_ANNOTATE_KV_PAIR
macro hack wouldn't be needed at all. From BTF pov, that would be some sort of
BTF_KIND_SEC as a section 'container' for all BTF_KIND_VAR variables belonging
to it, so map could use its id to describe its map value as such?
> 4.
> if we make BTF mandatory then the issue of map names will be solved too.
> The <obj>.bss/data/rodata might not fit into map_name[16],
> but with BTF we can give it full name.
> "<obj>.data" looks great to me especially from introspection angle.
+1
> 5.
> I think it makes sense to keep .bss vs .data vs .rodata separate.
> I can imagine a program using large zero-inited static array.
> It's important to save space in elf file, save time to do map create
> and map populate. With combined .data and .bss it would be much harder.
I'm fine either way, but yeah, this approach as also taken here means less
fixing up from loader pov, and it would probably also make BTF handling
easier. So, sticking with separate .bss/.data/.rodata then.
> 6.
> as the next step would be good to add two new syscall command to read/write
> individual static variables. Something like:
> bpf_map_read/write_bytes(map, key, offset_in_value, buf, size_of_buf)
> would allow user space to tweak certain variables.
> With mandatory BTF we can have an api to read/write variable by name
> (and not only by offset).
> Such bpf_map_read_bytes() would be useful for regular hash maps too.
> It's convenient to be able to update few bytes in the hash value instead
> of overwriting the whole thing.
Definitely useful, there was also the suggestion to have an mmap interface
which would work at least for array maps, though haven't looked yet into
what would be needed from buffer mgmt side to remap for processes that just
got the fd by id or via bpf fs for retrieving existing content.
> 7.
> since these 3 maps are created as normal arrays from kernel pov
> the 'bpftool map show' should not exclude them from the output.
> Similarly libbpf api bpf_object__for_each_map() and bpf_map__next()
> should probably include them too. There can be bpf_map__is_pseudo() flag
> to tell them apart, but since it's a normal map in the kernel
> other libbpf accessors bpf_map__name(), bpf_map__pin() should work too.
Yeah, I'll integrate it there in libbpf, as long as we have something to
tell them apart from a user pov it should be good.
Thanks,
Daniel
Powered by blists - more mailing lists