[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181109043507.dyemsy6va4wto67l@ast-mbp>
Date: Thu, 8 Nov 2018 20:35:10 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Edward Cree <ecree@...arflare.com>
Cc: Martin KaFai Lau <kafai@...com>, Yonghong Song <yhs@...com>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Network Development <netdev@...r.kernel.org>,
Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and
BTF_KIND_FUNC_PROTO
On Thu, Nov 08, 2018 at 10:56:55PM +0000, Edward Cree wrote:
> On 08/11/18 19:42, Alexei Starovoitov wrote:
> > same link let's continue at 1pm PST.
> So, one thing we didn't really get onto was maps, and you mentioned that it
> wasn't really clear what I was proposing there.
let's discuss ground rules first.
1. the patches should be upstreamable in both kernel _and_ llvm
For example you propose 'prog_symbol' down below.
I'm sure you can do it in your assembler. We can also do it in pahole
(because it operates on dwarf inside elf), but there is no way for us
to do it in llvm. Because symbol table gen comes last. dwarf, btf sections have
been completed before that. llvm 'fixup' logic also done before symbol table.
So to add something like 'prog_symbol' into .BTF.ext would mean that would need
an extra pass after the last one that messes with the stuff already finalized.
That will get nacked.
Another example: in our first llvm->btf patchset we were using llvm's dwarf to
generate btf. That got nacked and we had to do 2k+ lines complete rewrite
to generate btf from llvm ir. That forced us to generate btf slightly
differently than from dwarf. The difference is not fundamental (like bitfields).
But it drives the point that elf format is _secondary_ to #1 rule above.
Another example: you're proposing to teach bpf backend to recognize
____bpf_map* name. That is not something we can upstream in llvm either.
Similarly on the kernel api side we decided to craft an api in a way
that what is passed to the kernel is returned back in the same way.
Sort of like if kernel understands 'struct bpf_func_info' from patch 5
it should speak back in the same language.
I think it's important to agree on that principle to continue discussion.
Another key point is obvious, but worth repeating.
kernel abi is cast in stone. elf format is not.
which means that what was defined in include/uapi/linux/btf.h is fixed.
That is the format of .BTF section that llvm/pahole emits.
Whereas .BTF.ext section is _not_ defined in kernel uapi.
It defines the elf format and we can change it the future.
The format of .BTF.ext is the protocol between libbpf and llvm.
We define .BTF.ext in libbpf with libbpf license and coding style
and independently define it in llvm with its license and its coding
style.
We need to discuss .BTF.ext and make sure it's extensible, so we can
upgrade libbpf and llvm independently of each other,
but it doesn't have the same requirements as kernel abi.
Like in the kernel abi the extensibility of any structure means
that uknown fields should be zero.
Ex: sys_bpf() accepts 'bpf_attr' and size.
If user's size is larger than kernel size. All uknown fields
must be zero.
In case of libbpf <-> llvm that is not the case.
libbpf _has_ to deal with non-zero unknown fields in some way.
Like it can print warning, but it has to accept the extended format.
Otherwise upgrading llvm will not be possible without upgrading libbpf.
Similarly when both llvm is new and libbpf is new, but kernel is old,
libbpf has to pass to the kernel only the fields it knows about.
(though it may understand all of them).
So libbpf has to deal will all combinations: old/new kernel,
old/new llvm (and corresponding .BTF and .BTF.ext sections).
Now my understanding that you're mainly concerned with elf file format, correct?
I'm making this guess, because you're arguing strongly about KIND_MAP
and blasting our __btf_map* hack.
I'm all for improving this part, but it cannot go into .BTF,
because we already have an api to pass btf_key_type_id, btf_value_type_id
in MAP_CREATE syscall.
Adding new BTF_KIND_MAP into .BTF would be pointless, since we cannot
change MAP_CREATE.
As discussed on the call, currently we're designing KIND_VARIABLE
that will describe global/static variables in .BTF and corresponding
prog_load api changes for the kernel. We'll be talking about it
at bpf uconf. Imo current 'struct bpf_map_def map_foo;' hack
that libbpf/iproute2 and other loaders are dealing with
will be cleaned up by this KIND_VARIABLE BTF support.
Because of rule #1 we cannot pattern match 'bpf_map_def' name
in llvm. All global variables have to be dealt in the common way
by the llvm. We can add new __builtin_ specifically for map creation
and ask folks to start using new interface in .c programs for
map creation. All that is exciting, but I'd like to table that
discussion and focus on this patch set which is about adding
bpf_func_infos, BTF vs BTF.ext split, instances vs types.
> What I have in mind comes in two parts:
> 1) map type. A new BTF_KIND_MAP with metadata 'key_type', 'value_type'
> (both are type_ids referencing other BTF type records), describing the
> type "map from key_type to value_type".
> 2) record in the 'instances' table. This would have a name_off (the
> name of the map), a type_id (pointing at a BTF_KIND_MAP in the 'types'
> table), and potentially also some indication of what symbol (from
> section 'maps') refers to this map. This is pretty much the exact
> same metadata that a function in the 'instances' table has, the only
> differences being
> (a) function's type_id points at a BTF_KIND_FUNC record
> (b) function's symbol indication refers from .text section
> (c) in future functions may be nested inside other functions, whereas
> AIUI a map can't live inside a function. (But a variable, which is
> the other thing that would want to go in an 'instances' table, can.)
> So the 'instances' table record structure looks like
>
> struct btf_instance {
> __u32 type_id; /* Type of object declared. An index into type section */
> __u32 name_off; /* Name of object. An offset into string section */
> __u32 parent; /* Containing object if any (else 0). An index into instance section */
> };
I have two issues with above structure:
1. it's not naturally extensible.
Meaning it's possible to define new struct by bumping version in btf
header, but it's not clean.
2. parent field has to be checked for loops
Both of these issues are resolved by existing BTF and kernel code.
BTF layout is extensible. BTF kernel verifier already checks for loops.
Hence I propose to use existing 'struct btf_type' for these new
instances section instead of inventing new format.
> and we extend the BTF header:
>
> struct btf_header {
> __u16 magic;
> __u8 version;
> __u8 flags;
> __u32 hdr_len;
>
> /* All offsets are in bytes relative to the end of this header */
> __u32 type_off; /* offset of type section */
> __u32 type_len; /* length of type section */
> __u32 str_off; /* offset of string section */
> __u32 str_len; /* length of string section */
> __u32 inst_off; /* offset of instance section */
> __u32 inst_len; /* length of instance section */
The addition of above two fields is certainly doable
and fits design of BTF well. Kernel side already has code to
deal with multiple BTF sections.
We don't even need to bump version for that.
That's great, but the key point here is that btf.h is both
kernel abi and file format.
If llvm emits this instance section into elf file
libbpf loader should better pass it to the kernel as well.
Otherwise having populated instance section in elf and
empty instances for the kernel will make kernel patch to uapi/btf.h
non-upstreamble. Why ? Because we do not add fields to uapi structs
that kernel will not be using.
I suspect you're actually proposing to pass populated instance section
to the kernel as part of BTF_LOAD sys_bpf command. So all good.
Just making sure we're on the same page. Correct?
So the tweak to your 'struct btf_instance' proposal
is to use 'struct btf_type' of variable length instead
in this instances section.
And define new BTF_KIND_FUNC that can only be used
in the instances.
In the future we will add BTF_KIND_VARIABLE in there too.
As far as type section there we will use BTF_KIND_FUNC_PROTO only.
It will always have empty name and can have full names
for function arguments.
BTF_KIND_FUNC in instances section must have non-zero name_off
(which will describe function name) and type_id pointing
to BTF_KIND_FUNC_PROTO in type section.
Multiple KIND_FUNC potentially can point to the same KIND_FUNC_PROTO
if all argument names are the same or all of them are empty.
(and types match, of course).
I'm not counting on overall BTF size compression because of that,
but it's kinda nice to have this option.
> Then in the .BTF.ext section, we have both
>
> struct bpf_func_info {
> __u32 prog_symbol; /* Index of symbol giving address of subprog */
> __u32 inst_id; /* Index into instance section */
> }
Right, for .BPF.ext section we can keep existing
'struct bpf_func_info' with the minor tweak:
struct bpf_func_info {
__u32 insn_offset;
__u32 inst_id; // this is now index in the instance section
};
Since instance section will have KIND_FUNC and KIND_VARIABLE in the future
the kernel has to check that 'bpf_func_info->inst_id' in .BTF.ext
points to KIND_FUNC in .BTF.
How does this sound?
If we agree to that it will unblock us for this patch set
and for follow on patch set that adds 'struct bpf_line_info'
based on the same principles.
>
> struct bpf_map_info {
> {
> __u32 map_symbol; /* Index of symbol creating this map */
> __u32 inst_id; /* Index into instance section */
> }
The map discussion I'd like to table for now due to reasons
outlined above.
> think this question of maps should be discussed in tomorrow's
> call, since it is when we start having other kinds of instances
turned out most of us have a conflict, so the earliest is 1:30pm on Friday.
still works for you?
Powered by blists - more mailing lists