[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca4cb188-744d-5274-b12a-59fa3efc68f4@solarflare.com>
Date: Thu, 8 Nov 2018 22:56:55 +0000
From: Edward Cree <ecree@...arflare.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
CC: Martin KaFai Lau <kafai@...com>, Yonghong Song <yhs@...com>,
"Alexei Starovoitov" <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
"Network Development" <netdev@...r.kernel.org>,
Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and
BTF_KIND_FUNC_PROTO
On 08/11/18 19:42, Alexei Starovoitov wrote:
> same link let's continue at 1pm PST.
So, one thing we didn't really get onto was maps, and you mentioned that it
wasn't really clear what I was proposing there.
What I have in mind comes in two parts:
1) map type. A new BTF_KIND_MAP with metadata 'key_type', 'value_type'
(both are type_ids referencing other BTF type records), describing the
type "map from key_type to value_type".
2) record in the 'instances' table. This would have a name_off (the
name of the map), a type_id (pointing at a BTF_KIND_MAP in the 'types'
table), and potentially also some indication of what symbol (from
section 'maps') refers to this map. This is pretty much the exact
same metadata that a function in the 'instances' table has, the only
differences being
(a) function's type_id points at a BTF_KIND_FUNC record
(b) function's symbol indication refers from .text section
(c) in future functions may be nested inside other functions, whereas
AIUI a map can't live inside a function. (But a variable, which is
the other thing that would want to go in an 'instances' table, can.)
So the 'instances' table record structure looks like
struct btf_instance {
__u32 type_id; /* Type of object declared. An index into type section */
__u32 name_off; /* Name of object. An offset into string section */
__u32 parent; /* Containing object if any (else 0). An index into instance section */
};
and we extend the BTF header:
struct btf_header {
__u16 magic;
__u8 version;
__u8 flags;
__u32 hdr_len;
/* All offsets are in bytes relative to the end of this header */
__u32 type_off; /* offset of type section */
__u32 type_len; /* length of type section */
__u32 str_off; /* offset of string section */
__u32 str_len; /* length of string section */
__u32 inst_off; /* offset of instance section */
__u32 inst_len; /* length of instance section */
};
Then in the .BTF.ext section, we have both
struct bpf_func_info {
__u32 prog_symbol; /* Index of symbol giving address of subprog */
__u32 inst_id; /* Index into instance section */
}
struct bpf_map_info {
{
__u32 map_symbol; /* Index of symbol creating this map */
__u32 inst_id; /* Index into instance section */
}
(either living in different subsections, or in a single table with
the addition of a kind field, or in a single table relying on the
ultimately referenced type to distinguish funcs from maps).
Note that the name (in btf_instance) of a map or function need not
match the name of the corresponding symbol; we use the .BTF.ext
section to tie together btf_instance IDs and symbol IDs. Then in
the case of functions (subprogs), the prog_symbol can be looked
up in the ELF symbol table to find the address (== insn_offset)
of the subprog, as well as the section containing it (since that
might not be .text). Similarly in the case of maps the BTF info
about the map is connected with the info in the maps section.
Now when the loader has munged this, what it passes to the kernel
might not have map_symbol, but instead map_fd. Instead of
prog_symbol it will have whatever identifies the subprog in the
blob of stuff it feeds to the kernel (so probably insn_offset).
All this would of course require a bit more compiler support than
the current BPF_ANNOTATE_KV_PAIR, since that just causes the
existing BTF machinery to declare a specially constructed struct
type. At the C level you could still have BPF_ANNOTATE_KV_PAIR
and the '____bpf_map_foo' name, but then the compiler would
recognise that and convert it into an instance record by looking
up the name 'foo' in its "maps" section. That way the special
____bpf_map_* handling (which ties map names to symbol names,
also) would be entirely compiler-internal and not 'leak out' into
the definition of the format. Frontends for other languages
which do possess a native map type (e.g. Python dict) might have
other ways of indicating the key/value type of a map at source
level (e.g. PEP 484) and could directly generate the appropriate
BTF_KIND_MAP and bpf_map_info records rather than (as they would
with the current design) having to encode the information as a
struct ____bpf_map_foo type-definition.
While I realise the desire to concentrate on one topic at once, I
think this question of maps should be discussed in tomorrow's
call, since it is when we start having other kinds of instances
besides functions that the advantages of my design become
apparent, unifying the process of 'declaration' of functions,
maps, and (eventually) variables while separating them all from
the process of 'definition' of the types of all three.
Thank you for your continued patience with me.
-Ed
Powered by blists - more mailing lists