netdev - Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181106062913.h7zxoduijayfqvbr@ast-mbp>
Date:   Mon, 5 Nov 2018 22:29:15 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Edward Cree <ecree@...arflare.com>
Cc:     Martin Lau <kafai@...com>, Yonghong Song <yhs@...com>,
        Alexei Starovoitov <ast@...com>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and
 BTF_KIND_FUNC_PROTO

On Thu, Nov 01, 2018 at 09:08:37PM +0000, Edward Cree wrote:
> I've spent a bit more time thinking about / sleeping on this, and I
>  still think there's a major disagreement here.  Basically it seems
>  like I'm saying "the design of BTF is wrong" and you're saying "but
>  it's the design" (with the possible implication — I'm not entirely
>  sure — of "but that's what DWARF does").
> So let's back away from the details about FUNC/PROTO, and talk in
>  more general terms about what a BTF record means.
> There are two classes of things we might want to put in debug-info:
>  * There exists a type T
>  * I have an instance X (variable, subprogram, etc.) of type T
> Both of these may need to reference other types, and have the same
>  space of possible things T could be, but there the similarity ends:
>  they are semantically different things.
> Indeed, the only reason for any record of the first class is to
>  define types referenced by records of the second class.  Some
>  concrete examples of records of the second class are:
> 1) I have a map named "foo" with key-type T1 and value-type T2
> 2) I have a subprogram named "bar" with prototype T3
> 3) I am using stack slot fp-8 to store a value of type T4
> 4) I am casting ctx+8 to a pointer type T5 before dereferencing it
> Currently we have (1) and this patch series adds (2), both done
>  through records that look like they are just defining a type (i.e.
>  the first class of record) but have 'magic' semantics (in the case
>  of (1), special names of the form ____btf_map_foo.  How anyone
>  thought that was a clean and tasteful design is beyond me.)
> What IMHO the design *should* be, is that we have a 'types'
>  subsection that *only* contains records of the first class, and
>  then other subsections to hold records of the second class that
>  reference records of the first class by ID.

Such pure type approach wouldn't be practical.
BTF is not pure type information. BTF is everything that verifier needs
to know to make safety decisions that bpf instruction set doesn't have.
For example two anonymous structs:
struct {
 int a;
 int b;
} var1;
struct {
  int c;
  int d;
} var2;
from C point of view have the same type, but from BTF point of view
they are different. Names of the fields are essential part of the BTF
because the purpose of BTF is to provide information about bpf objects
for debugging and safety reasons.
Similarly
int (*) (void *src, void *dst, int len);
and
int (*) (void *dst, void *src, int len);
are the same from C and compiler point of view,
but they are different in BTF, because names carry information
that needs to be preserved.
Same goes for function declarations. The function name and argument
names are part of the 'type description'.
We shouldn't be using word 'type' in pure form otherwise
it will cause confusion like this thread demonstrated.
Beyond prog names expressed in BTF we're adding global variables support.
They will be expressed in BTF as a new KIND.
Think of all global variables in a single .c file as fields of
anonymous structure. They have offsets from beginning of .bss,
sizes, further references into btf_type_ids and most importantly names.
Another thing we're working on is spin_lock support.
There we also have to rely on BTF to make sure that the certain
bytes of map's value or cgroup local storage that belong to spin_lock
will be masked for lookup/update calls.
typedef u32 bpf_spin_lock;
will be recognized by the verifier by its name.
May be we will introduce new KIND_ for spin_lock too and convert name
into KIND in btf writer. That is TBD. The main point that
names of types, variables, functions has to be expressed in BTF
as one coherent graph of information.
Splitting pure types into one section, variables into another,
functions into yet another is not practical, since the same
modifiers (like const or volatile) need to be applied to
variables and functions. At the end all sections will have
the same style of encoding, hence no need to duplicate
the encoding three times and instead it's cleaner to encode
all of them BTF-style via different KINDs.
vmlinux's BTF will include all global variables and functions,
so that tracing scripts can reference to particular variable
or function argument by name making kernel debuging easier
and less error prone.
Consider that right now typical bcc's trace.py script looks like:
trace.py -I linux/skbuff.h -I net/sock.h \
  'skb_recv_datagram(struct sock *sk, unsigned int flags, int noblock, int *err) \
    "sk_recv_q:%llx next:%llx prev:%llx" \
    &sk->sk_receive_queue,sk->sk_receive_queue.next,sk->sk_receive_queue.prev'
where func declaration is copy pasted from the C source
and there are no guarantees that argument names and their order match
what vmlinux actually has.
With fully named BTF of vmlinux the tracing scripts will become more robust.
Note that BTF_KIND_FUNC is pretty much the same as BTF_KIND_STRUCT with
an addition of return type.
Kernel specific things like per-cpu attribute of the variable will
be BTF KIND too. This is information that tooling and kernel needs
and current BTF graph design is perfectly suited to carry such info.