lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05585430-0bc4-51ca-412a-48a8539082b1@solarflare.com>
Date:   Thu, 17 Jan 2019 15:13:59 +0000
From:   Edward Cree <ecree@...arflare.com>
To:     Yonghong Song <yhs@...com>, <ast@...com>, <daniel@...earbox.net>,
        <netdev@...r.kernel.org>
CC:     <kernel-team@...com>
Subject: Re: [PATCH bpf-next] bpf: btf: add btf documentation

On 16/01/19 00:58, Yonghong Song wrote:
> This patch added documentation for BTF (BPF Debug Format).
> The document is placed under linux:Documentation/bpf directory.
>
> Signed-off-by: Yonghong Song <yhs@...com>
I like this a lot overall, it does a good job of explaining how the
 various pieces fit together.
See inline for review comments.

> ---
>  Documentation/bpf/btf.rst   | 787 ++++++++++++++++++++++++++++++++++++
>  Documentation/bpf/index.rst |   7 +
>  2 files changed, 794 insertions(+)
>  create mode 100644 Documentation/bpf/btf.rst
>
> diff --git a/Documentation/bpf/btf.rst b/Documentation/bpf/btf.rst
> new file mode 100644
> index 000000000000..3dfa8edd22ac
> --- /dev/null
> +++ b/Documentation/bpf/btf.rst
> @@ -0,0 +1,787 @@
> +=====================
> +BPF Type Format (BTF)
> +=====================
> +
> +1. Introduction
> +***************
> +
> +BTF (BPF Type Format) is the meta data format which
> +encodes the debug info related to BPF program/map.
> +The name BTF was used initially to describe
> +data types. The BTF was later extended to include
> +function info for defined subroutines, and line info
> +for source/line information.
> +
> +The debug info is used for map pretty print, function
> +signature, etc. The function signature enables better
> +bpf program/function kernel symbol.
> +The line info helps generate
> +source annotated translated byte code, jited code
> +and verifier log.
> +
> +The BTF specification contains two parts,
> +  * BTF kernel API
> +  * BTF ELF file format
> +
> +The kernel API is the contract between
> +user space and kernel. The kernel verifies
> +the BTF info before using it.
> +The ELF file format is a user space contract
> +between ELF file and libbpf loader.
> +
> +The type and string sections are part of the
> +BTF kernel API, describing the debug info
> +(mostly types related) referenced by the bpf program.
> +These two sections are discussed in
> +details in Section 2.
> +
> +2. BTF Type/String Encoding
> +***************************
> +
> +The file ``include/uapi/linux/btf.h`` provides high
> +level definition on how types/strings are encoded.
> +
> +The beginning of data blob must be::
> +
> +    struct btf_header {
> +        __u16   magic;
> +        __u8    version;
> +        __u8    flags;
> +        __u32   hdr_len;
> +
> +        /* All offsets are in bytes relative to the end of this header */
> +        __u32   type_off;       /* offset of type section       */
> +        __u32   type_len;       /* length of type section       */
> +        __u32   str_off;        /* offset of string section     */
> +        __u32   str_len;        /* length of string section     */
> +    };
> +
> +The magic is ``0xeB9F``, which has different encoding for big and little
> +endian system, and can be used to test whether BTF is generated for
> +big or little endian target.
> +The btf_header is designed to be extensible with hdr_len specifying
> +the struct btf_header length when the data blob is generated.
Should probably specify here whether hdr_len includes the whole header or
 starts from offsetofend(hdr_len).  (I believe it's the whole thing.)

> +
> +2.1 String Encoding
> +===================
> +
> +The first byte of string section must be ``'\0'`` to represent a null string.
Perhaps "empty string" is more precise than "null string"?
> +The rest of string table is a cancatenation of other strings.
sp: concatenation.
Possibly also state that those other strings are nul-terminated.
> +
> +2.2 Type Encoding
> +=================
> +
> +The type id ``0`` is reserved for ``void`` type.
> +The type section is parsed sequentially and the type id is assigned to
> +each recognized type starting from id ``1``.
> +Currently, the following types are supported::
> +
> +    #define BTF_KIND_INT            1       /* Integer      */
> +    #define BTF_KIND_PTR            2       /* Pointer      */
> +    #define BTF_KIND_ARRAY          3       /* Array        */
> +    #define BTF_KIND_STRUCT         4       /* Struct       */
> +    #define BTF_KIND_UNION          5       /* Union        */
> +    #define BTF_KIND_ENUM           6       /* Enumeration  */
> +    #define BTF_KIND_FWD            7       /* Forward      */
> +    #define BTF_KIND_TYPEDEF        8       /* Typedef      */
> +    #define BTF_KIND_VOLATILE       9       /* Volatile     */
> +    #define BTF_KIND_CONST          10      /* Const        */
> +    #define BTF_KIND_RESTRICT       11      /* Restrict     */
> +    #define BTF_KIND_FUNC           12      /* Function     */
> +    #define BTF_KIND_FUNC_PROTO     13      /* Function Proto       */
> +
> +Note that the type section encodes debug info, not just pure types.
> +``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
> +
> +Each type contains the following common data::
> +
> +    struct btf_type {
> +        __u32 name_off;
> +        /* "info" bits arrangement
> +         * bits  0-15: vlen (e.g. # of struct's members)
> +         * bits 16-23: unused
> +         * bits 24-27: kind (e.g. int, ptr, array...etc)
> +         * bits 28-30: unused
> +         * bit     31: kind_flag, currently used by
> +         *             struct, union and fwd
> +         */
> +        __u32 info;
> +        /* "size" is used by INT, ENUM, STRUCT and UNION.
> +         * "size" tells the size of the type it is describing.
> +         *
> +         * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
> +         * FUNC and FUNC_PROTO.
> +         * "type" is a type_id referring to another type.
> +         */
> +        union {
> +                __u32 size;
> +                __u32 type;
> +        };
> +    };
> +
> +For certain kinds, the common data are followed by kind specific data.
> +The ``name_off`` in ``struct btf_type`` specifies the offset in the string table.
> +The following details encoding of each kind.
> +
> +2.2.1 BTF_KIND_INT
> +~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> + * ``name_off``: any valid offset
> + * ``info.kind_flag``: 0
> + * ``info.kind``: BTF_KIND_INT
> + * ``info.vlen``: 0
> + * ``size``: the size of the int type in bytes.
> +
> +``btf_type`` is followed by a ``u32`` with following bits arrangement::
> +
> +  #define BTF_INT_ENCODING(VAL)   (((VAL) & 0x0f000000) >> 24)
> +  #define BTF_INT_OFFSET(VAL)     (((VAL  & 0x00ff0000)) >> 16)
> +  #define BTF_INT_BITS(VAL)       ((VAL)  & 0x000000ff)
> +
> +The ``BTF_INT_ENCODING`` has the following attributes::
> +
> +  #define BTF_INT_SIGNED  (1 << 0)
> +  #define BTF_INT_CHAR    (1 << 1)
> +  #define BTF_INT_BOOL    (1 << 2)
> +
> +The ``BTF_INT_ENCODING()`` provides extra information, signness,
> +char, or bool, for the int type. The char and bool encoding
> +are mostly useful for pretty print. At most one encoding can
> +be specified for the int type.
> +
> +The ``BTF_INT_OFFSET()`` specifies the starting bit offset to
> +calculate values for this int.
That really doesn't make clear, at least to me, what this field is for.
> Typically it should be 0 and
> +currently both llvm and pahole generates ``BTF_INT_OFFSET() = 0``.
> +
> +The ``BTF_INT_BITS()`` specifies the number of actual bits held by
> +this int type. For example, a 4-bit bitfield encodes
> +``BTF_INT_BITS()`` equals to 4. The ``btf_type.size * 8``
> +must be equal to or greater than ``BTF_INT_BITS()`` for the type.
> +The maximum value of ``BTF_INT_BITS()`` is 128.
> +
> +2.2.2 BTF_KIND_PTR
> +~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: 0
> +  * ``info.kind_flag``: 0
> +  * ``info.kind``: BTF_KIND_PTR
> +  * ``info.vlen``: 0
> +  * ``type``: the pointee type of the pointer
> +
> +No additional type data follow ``btf_type``.
> +
> +2.2.3 BTF_KIND_ARRAY
> +~~~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: 0
> +  * ``info.kind_flag``: 0
> +  * ``info.kind``: BTF_KIND_ARRAY
> +  * ``info.vlen``: 0
> +  * ``size/type``: 0, not used
> +
> +btf_type is followed by one "struct btf_array"::
> +
> +    struct btf_array {
> +        __u32   type;
> +        __u32   index_type;
> +        __u32   nelems;
> +    };
> +
> +The ``struct btf_array`` encoding:
> +  * ``type``: the element type
> +  * ``index_type``: the index type
Is this ever anything but u32?  What is the purpose of this field's existence?
> +  * ``nelems``: the number of elements for this array.
> +
> +For a multiple dimensional array, e.g., ``a[5][6]``, the btf_array.nelems
> +equals ``30``.
Does this mean that there is nothing in BTF to distinguish a multi-dimensional
 array from a single-dimensional array of the same size?
Why is this done, rather than chaining BTF_ARRAY records?
> ``nelems = 0`` is also allowed.
> +
> +2.2.4 BTF_KIND_STRUCT
> +~~~~~~~~~~~~~~~~~~~~~
> +2.2.5 BTF_KIND_UNION
> +~~~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: 0 or offset to a valid C identifier
> +  * ``info.kind_flag``: 0 or 1
> +  * ``info.kind``: BTF_KIND_STRUCT or BTF_KIND_UNION
> +  * ``info.vlen``: the number of struct/union members
> +  * ``info.size``: the size of the struct/union in bytes
> +
> +``btf_type`` is followed by ``info.vlen`` number of ``struct btf_member``.::
> +
> +    struct btf_member {
> +        __u32   name_off;
> +        __u32   type;
> +        __u32   offset;
> +    };
> +
> +``struct btf_member`` encoding:
> +  * ``name_off``: offset to a valid C identifier
> +  * ``type``: the member type
> +  * ``offset``: <see below>
> +
> +If the type info ``kind_flag`` is not set, the offset contains
> +only bit offset of the member. Note that the base type of the
> +bitfield can only be int or enum type. If the bitfield size
> +is 32, the base type can be either int or enum type.
> +If the bitfield size is not 32, the base type must be int,
> +and int type ``BTF_INT_BITS()`` encodes the bitfield size.
> +
> +If the ``kind_flag`` is set, the ``btf_member.offset``
> +contains both member bitfield size and bit offset. The
> +bitfield size and bit offset are calculated as below.::
> +
> +  #define BTF_MEMBER_BITFIELD_SIZE(val)   ((val) >> 24)
> +  #define BTF_MEMBER_BIT_OFFSET(val)      ((val) & 0xffffff)
> +
> +In this case, if the base type is an int type, it must
> +be a regular int type:
> +
> +  * ``BTF_INT_OFFSET()`` must be 0.
> +  * ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``.
Probably worth referencing here the patch that added kind_flag, as that
 explains why these two different modes exist.
> +
> +2.2.6 BTF_KIND_ENUM
> +~~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: 0 or offset to a valid C identifier
> +  * ``info.kind_flag``: 0
> +  * ``info.kind``: BTF_KIND_ENUM
> +  * ``info.vlen``: number of enum values
> +  * ``size``: 4
> +
> +``btf_type`` is followed by ``info.vlen`` number of ``struct btf_enum``.::
> +
> +    struct btf_enum {
> +        __u32   name_off;
> +        __s32   val;
> +    };
> +
> +The ``btf_enum`` encoding:
> +  * ``name_off``: offset to a valid C identifier
> +  * ``val``: any value
> +
> +2.2.7 BTF_KIND_FWD
> +~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: offset to a valid C identifier
> +  * ``info.kind_flag``: 0 for struct, 1 for union
> +  * ``info.kind``: BTF_KIND_FWD
> +  * ``info.vlen``: 0
> +  * ``type``: 0
> +
> +No additional type data follow ``btf_type``.
> +
> +2.2.8 BTF_KIND_TYPEDEF
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: offset to a valid C identifier
> +  * ``info.kind_flag``: 0
> +  * ``info.kind``: BTF_KIND_TYPEDEF
> +  * ``info.vlen``: 0
> +  * ``type``: the type to be redefined
This is unclear phrasing.  How about:
 * ``type``: the type to be given a name
Because a typedef doesn't 'redefine' ``type``, it defines the _name_ as
 referring to ``type``.
(I realise my phrasing isn't the best either, but I can't figure out how
 to further improve it.)
> +
> +No additional type data follow ``btf_type``.
> +
> +2.2.9 BTF_KIND_VOLATILE
> +~~~~~~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: 0
> +  * ``info.kind_flag``: 0
> +  * ``info.kind``: BTF_KIND_VOLATILE
> +  * ``info.vlen``: 0
> +  * ``type``: the type having volatile modifier
This is again a little bit blurry.  ``type`` doesn't "have volatile
 modifier"; it is rather the type _to which_ a volatile modifier is
 applied to create the type defined by this record.
Maybe something like "the type to be volatile-qualified"?
(Note that the C standard refers to const, volatile and restrict as
 'qualifiers', not 'modifiers', and we should probably follow that
 terminology.)

> +
> +No additional type data follow ``btf_type``.
> +
> +2.2.10 BTF_KIND_CONST
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: 0
> +  * ``info.kind_flag``: 0
> +  * ``info.kind``: BTF_KIND_CONST
> +  * ``info.vlen``: 0
> +  * ``type``: the type having const modifier
> +
> +No additional type data follow ``btf_type``.
> +
> +2.2.11 BTF_KIND_RESTRICT
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: 0
> +  * ``info.kind_flag``: 0
> +  * ``info.kind``: BTF_KIND_RESTRICT
> +  * ``info.vlen``: 0
> +  * ``type``: the type having restrict modifier
> +
> +No additional type data follow ``btf_type``.
> +
> +2.2.12 BTF_KIND_FUNC
> +~~~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: offset to a valid C identifier
> +  * ``info.kind_flag``: 0
> +  * ``info.kind``: BTF_KIND_FUNC
> +  * ``info.vlen``: 0
> +  * ``type``: a BTF_KIND_FUNC_PROTO type
You should put an explanation here of the semantics of this.  Above it was
 mentioned that BTF_KIND_FUNC does not declare a type, but that should be
 expanded on here to more fully explain the relationship between
 BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO.  Perhaps something like:
A BTF_KIND_FUNC defines, not a type, but a subprogram (function) whose
 signature is defined by ``type``; the subprogram is thus an instance of
 that type.  The BTF_KIND_FUNC may in turn be referenced by a func_info in
 the `.BTF.ext section`__ (ELF) or in the arguments to BPF_PROG_LOAD__ (ABI).
.. __: `4.2 .BTF.ext section`_
.. __: `3.3 BPF_PROG_LOAD`_
> +
> +No additional type data follow ``btf_type``.
> +
> +2.2.13 BTF_KIND_FUNC_PROTO
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +``struct btf_type`` encoding requirement:
> +  * ``name_off``: 0
> +  * ``info.kind_flag``: 0
> +  * ``info.kind``: BTF_KIND_FUNC_PROTO
> +  * ``info.vlen``: # of parameters
> +  * ``type``: the return type
> +
> +``btf_type`` is followed by ``info.vlen`` number of ``struct btf_param``.::
> +
> +    struct btf_param {
> +        __u32   name_off;
> +        __u32   type;
> +    };
> +
> +If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type,
> +then ``btf_param.name_off`` must point to a valid C identifier
> +except for the possible last argument representing the variable
> +argument. The btf_param.type refers to parameter type.
> +
> +If the function has the variable arguments, the last parameter
s/has the/has/
> +is encoded with ``name_off = 0`` and ``type = 0``.
> +
> +3. BTF Kernel API
> +*****************
> +
> +The following bpf syscall command involves BTF:
> +   * BPF_BTF_LOAD: load a blob of BTF data into kernel
> +   * BPF_MAP_CREATE: map creation with btf key and value type info.
> +   * BPF_PROG_LOAD: prog load with btf function and line info.
> +   * BPF_BTF_GET_FD_BY_ID: get a btf fd
> +   * BPF_OBJ_GET_INFO_BY_FD: btf, func_info, line_info
> +     and other btf related info are returned.
> +
> +The workflow typically looks like:
> +::
> +
> +  Application:
> +      BPF_BTF_LOAD
> +          |
> +          v
> +      BPF_MAP_CREATE & BPF_PROG_LOAD
> +          |
> +          V
> +      ......
> +
> +  Introspection tool:
> +      ......
> +          |
> +          V
> +      BPF_OBJ_GET_INFO_BY_FD (get bpf_prog_info/bpf_map_info with btf_id)
> +          |
> +          V
> +      BPF_BTF_GET_FD_BY_ID (get btf_fd)
> +          |
> +          V
> +      BPF_OBJ_GET_INFO_BY_FD (get btf)
> +          |
> +          V
> +      pretty print types, dump func signatures and line info, etc.
> +
> +
> +3.1 BPF_BTF_LOAD
> +================
> +
> +Load a blob of BTF data into kernel. A blob of data
> +described in Section 2 can be directly loaded into the kernel.
> +A ``btf_fd`` returns to userspace.
> +
> +3.2 BPF_MAP_CREATE
> +==================
> +
> +A map can be created with ``btf_fd`` and specified key/value type id.::
> +
> +    __u32   btf_fd;         /* fd pointing to a BTF type data */
> +    __u32   btf_key_type_id;        /* BTF type_id of the key */
> +    __u32   btf_value_type_id;      /* BTF type_id of the value */
> +
> +In libbtf, if the map is specified like below in the bpf program:
Should this say libbpf?
> +::
> +
> +    struct bpf_map_def SEC("maps") btf_map = {
> +        .type = BPF_MAP_TYPE_ARRAY,
> +        .key_size = sizeof(int),
> +        .value_size = sizeof(struct ipv_counts),
> +        .max_entries = 4,
> +    };
> +    BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts);
> +
> +Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name,
> +key and value types for the map.
> +During ELF parsing, libbpf is able to extract key/value type_id's
> +and assigned them to BPF_MAP_CREATE attributes automatically.
> +
> +3.3 BPF_PROG_LOAD
> +=================
> +
> +During prog_load, func_info and line_info can be passed to kernel with
> +proper values for the following attributes:
> +::
> +
> +    __u32           insn_cnt;
> +    __aligned_u64   insns;
> +    ......
> +    __u32           prog_btf_fd;    /* fd pointing to BTF type data */
> +    __u32           func_info_rec_size;     /* userspace bpf_func_info size */
> +    __aligned_u64   func_info;      /* func info */
> +    __u32           func_info_cnt;  /* number of bpf_func_info records */
> +    __u32           line_info_rec_size;     /* userspace bpf_line_info size */
> +    __aligned_u64   line_info;      /* line info */
> +    __u32           line_info_cnt;  /* number of bpf_line_info records */
> +
> +The func_info and line_info are an array of below, respectively.::
> +
> +    struct bpf_func_info {
> +        __u32   insn_off; /* [0, insn_cnt - 1] */
> +        __u32   type_id;  /* pointing to a BTF_KIND_FUNC type */
> +    };
> +    struct bpf_line_info {
> +        __u32   insn_off; /* [0, insn_cnt - 1] */
> +        __u32   file_name_off; /* offset to string table for the filename */
> +        __u32   line_off; /* offset to string table for the source line */
> +        __u32   line_col; /* line number and column number */
> +    };
> +
> +func_info_rec_size is the size of each func_info record, and line_info_rec_size
> +is the size of each line_info record. Passing the record size to kernel make
> +it possible to extend the record itself in the future.
> +
> +Below are requirements for func_info:
> +  * func_info[0].insn_off must be 0.
> +  * the func_info insn_off is in strictly increasing order and matches
> +    bpf func boundaries.
> +
> +Below are requirements for line_info:
> +  * the first insn in each func must points to a line_info record.
> +  * the line_info insn_off is in strictly increasing order.
> +
> +For line_info, the line number and column number are defined as below:
> +::
> +
> +    #define BPF_LINE_INFO_LINE_NUM(line_col)        ((line_col) >> 10)
> +    #define BPF_LINE_INFO_LINE_COL(line_col)        ((line_col) & 0x3ff)
> +
> +3.4 BPF_BTF_GET_FD_BY_ID
> +========================
> +
> +  Given a btf id, a btf fd is returned.
> +
> +3.5 BPF_OBJ_GET_INFO_BY_FD
> +==========================
> +
> +Users can get btf blob, bpf_map_info and bpf_prog_info.
> +bpf_map_info returns btf_id, key/value type id.
What exactly is btf_id in this case?  The type_id of the
 BPF_ANNOTATE_KV_PAIR struct?
> +bpf_prog_info returns btf_id, func_info and line info
> +for translated bpf byte codes, and jited_line_info.
In this case presumably btf_id is the type_id of the BTF_KIND_FUNC;
 perhaps that should be stated explicitly too.
> +
> +4. ELF File Format Interface
> +****************************
> +
> +4.1 .BTF section
> +================
> +
Really you should state what this section is _supposed_ to contain
 before starting to talk about what existing implementations generate.
> +pahole currently generates .BTF section with the same format
> +as described in Section 2. pahole doesn't generate
> +BTF_KIND_FUNC yet.
> +
> +llvm generates two sections .BTF and .BTF.ext.
> +The .BTF section has the same specification as in Section 2.
> +The .BTF.ext section encodes func_info and line_info which
> +needs loader manipulation before loading into the kernel.
> +
> +4.2 .BTF.ext section
> +====================
> +
> +The specification for .BTF.ext section is defined at
> +``tools/lib/bpf/btf.h`` and ``tools/lib/bpf/btf.c``.
> +
> +The current header of .BTF.ext section::
> +
> +    struct btf_ext_header {
> +        __u16   magic;
> +        __u8    version;
> +        __u8    flags;
> +        __u32   hdr_len;
> +
> +        /* All offsets are in bytes relative to the end of this header */
> +        __u32   func_info_off;
> +        __u32   func_info_len;
> +        __u32   line_info_off;
> +        __u32   line_info_len;
> +    };
> +
> +It is very similar to .BTF section. Instead of type/string section,
> +it contains func_info and line_info section.
Perhaps there should be a link back to §3.3 here, as that has the definitions
 of structs bpf_func_info and bpf_line_info.

-Ed
> +
> +The func_info is organized as below.::
> +
> +     func_info_rec_size
> +     btf_ext_info_sec for section #1 /* func_info for section #1 */
> +     btf_ext_info_sec for section #2 /* func_info for section #2 */
> +     ...
> +
> +``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure
> +when .BTF.ext is generated. btf_ext_info_sec, defined below, is
> +the func_info for each specific ELF section.::
> +
> +     struct btf_ext_info_sec {
> +        __u32   sec_name_off; /* offset to section name */
> +        __u32   num_info;
> +        /* Followed by num_info * record_size number of bytes */
> +        __u8    data[0];
> +     };
> +
> +Here, num_info must be greater than 0.
> +
> +The line_info is organized as below.::
> +
> +     line_info_rec_size
> +     btf_ext_info_sec for section #1 /* line_info for section #1 */
> +     btf_ext_info_sec for section #2 /* line_info for section #2 */
> +     ...
> +
> +``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure
> +when .BTF.ext is generated.
> +
> +The interpretation of ``bpf_func_info->insn_off`` and
> +``bpf_line_info->insn_off`` is different between kernel API and ELF API.
> +For kernel API, the ``insn_off`` is the instruction offset in the unit
> +of ``struct bpf_insn``. For ELF API, the ``insn_off`` is the byte offset
> +from the beginning of section (``btf_ext_info_sec->sec_name_off``).
> +
> +5. Using BTF
> +************
> +
> +5.1 bpftool map pretty print
> +============================
> +
> +With BTF, the map key/value can be printed based on fields rather than
> +simply raw bytes. This is especially
> +valuable for large structure or if you data structure
> +has bitfields. For example, for the following map,::
> +
> +      enum A { A1, A2, A3, A4, A5 };
> +      typedef enum A ___A;
> +      struct tmp_t {
> +           char a1:4;
> +           int  a2:4;
> +           int  :4;
> +           __u32 a3:4;
> +           int b;
> +           ___A b1:4;
> +           enum A b2:4;
> +      };
> +      struct bpf_map_def SEC("maps") tmpmap = {
> +           .type = BPF_MAP_TYPE_ARRAY,
> +           .key_size = sizeof(__u32),
> +           .value_size = sizeof(struct tmp_t),
> +           .max_entries = 1,
> +      };
> +      BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t);
> +
> +bpftool is able to pretty print like below:
> +::
> +
> +      [{
> +            "key": 0,
> +            "value": {
> +                "a1": 0x2,
> +                "a2": 0x4,
> +                "a3": 0x6,
> +                "b": 7,
> +                "b1": 0x8,
> +                "b2": 0xa
> +            }
> +        }
> +      ]
> +
> +5.2 bpftool prog dump
> +=====================
> +
> +The following is an example to show func_info and line_info
> +can help prog dump with better ksym name, function prototype
> +and line information.::
> +
> +    $ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv
> +    [...]
> +    int test_long_fname_2(struct dummy_tracepoint_args * arg):
> +    bpf_prog_44a040bf25481309_test_long_fname_2:
> +    ; static int test_long_fname_2(struct dummy_tracepoint_args *arg)
> +       0:   push   %rbp
> +       1:   mov    %rsp,%rbp
> +       4:   sub    $0x30,%rsp
> +       b:   sub    $0x28,%rbp
> +       f:   mov    %rbx,0x0(%rbp)
> +      13:   mov    %r13,0x8(%rbp)
> +      17:   mov    %r14,0x10(%rbp)
> +      1b:   mov    %r15,0x18(%rbp)
> +      1f:   xor    %eax,%eax
> +      21:   mov    %rax,0x20(%rbp)
> +      25:   xor    %esi,%esi
> +    ; int key = 0;
> +      27:   mov    %esi,-0x4(%rbp)
> +    ; if (!arg->sock)
> +      2a:   mov    0x8(%rdi),%rdi
> +    ; if (!arg->sock)
> +      2e:   cmp    $0x0,%rdi
> +      32:   je     0x0000000000000070
> +      34:   mov    %rbp,%rsi
> +    ; counts = bpf_map_lookup_elem(&btf_map, &key);
> +    [...]
> +
> +5.3 verifier log
> +================
> +
> +The following is an example how line_info can help verifier failure debug.::
> +
> +       /* The code at tools/testing/selftests/bpf/test_xdp_noinline.c
> +        * is modified as below.
> +        */
> +       data = (void *)(long)xdp->data;
> +       data_end = (void *)(long)xdp->data_end;
> +       /*
> +       if (data + 4 > data_end)
> +               return XDP_DROP;
> +       */
> +       *(u32 *)data = dst->dst;
> +
> +    $ bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp
> +        ; data = (void *)(long)xdp->data;
> +        224: (79) r2 = *(u64 *)(r10 -112)
> +        225: (61) r2 = *(u32 *)(r2 +0)
> +        ; *(u32 *)data = dst->dst;
> +        226: (63) *(u32 *)(r2 +0) = r1
> +        invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0)
> +        R2 offset is outside of the packet
> +
> +6. BTF Generation
> +*****************
> +
> +You need latest pahole
> +
> +  https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
> +
> +or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't support .BTF.ext
> +and btf BTF_KIND_FUNC type yet. For example,::
> +
> +      -bash-4.4$ cat t.c
> +      struct t {
> +        int a:2;
> +        int b:3;
> +        int c:2;
> +      } g;
> +      -bash-4.4$ gcc -c -O2 -g t.c
> +      -bash-4.4$ pahole -JV t.o
> +      File t.o:
> +      [1] STRUCT t kind_flag=1 size=4 vlen=3
> +              a type_id=2 bitfield_size=2 bits_offset=0
> +              b type_id=2 bitfield_size=3 bits_offset=2
> +              c type_id=2 bitfield_size=2 bits_offset=5
> +      [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
> +
> +The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target only.
> +The assembly code (-S) is able to show the BTF encoding in assembly format.::
> +
> +    -bash-4.4$ cat t2.c
> +    typedef int __int32;
> +    struct t2 {
> +      int a2;
> +      int (*f2)(char q1, __int32 q2, ...);
> +      int (*f3)();
> +    } g2;
> +    int main() { return 0; }
> +    int test() { return 0; }
> +    -bash-4.4$ clang -c -g -O2 -target bpf t2.c
> +    -bash-4.4$ readelf -S t2.o
> +      ......
> +      [ 8] .BTF              PROGBITS         0000000000000000  00000247
> +           000000000000016e  0000000000000000           0     0     1
> +      [ 9] .BTF.ext          PROGBITS         0000000000000000  000003b5
> +           0000000000000060  0000000000000000           0     0     1
> +      [10] .rel.BTF.ext      REL              0000000000000000  000007e0
> +           0000000000000040  0000000000000010          16     9     8
> +      ......
> +    -bash-4.4$ clang -S -g -O2 -target bpf t2.c
> +    -bash-4.4$ cat t2.s
> +      ......
> +            .section        .BTF,"",@progbits
> +            .short  60319                   # 0xeb9f
> +            .byte   1
> +            .byte   0
> +            .long   24
> +            .long   0
> +            .long   220
> +            .long   220
> +            .long   122
> +            .long   0                       # BTF_KIND_FUNC_PROTO(id = 1)
> +            .long   218103808               # 0xd000000
> +            .long   2
> +            .long   83                      # BTF_KIND_INT(id = 2)
> +            .long   16777216                # 0x1000000
> +            .long   4
> +            .long   16777248                # 0x1000020
> +      ......
> +            .byte   0                       # string offset=0
> +            .ascii  ".text"                 # string offset=1
> +            .byte   0
> +            .ascii  "/home/yhs/tmp-pahole/t2.c" # string offset=7
> +            .byte   0
> +            .ascii  "int main() { return 0; }" # string offset=33
> +            .byte   0
> +            .ascii  "int test() { return 0; }" # string offset=58
> +            .byte   0
> +            .ascii  "int"                   # string offset=83
> +      ......
> +            .section        .BTF.ext,"",@progbits
> +            .short  60319                   # 0xeb9f
> +            .byte   1
> +            .byte   0
> +            .long   24
> +            .long   0
> +            .long   28
> +            .long   28
> +            .long   44
> +            .long   8                       # FuncInfo
> +            .long   1                       # FuncInfo section string offset=1
> +            .long   2
> +            .long   .Lfunc_begin0
> +            .long   3
> +            .long   .Lfunc_begin1
> +            .long   5
> +            .long   16                      # LineInfo
> +            .long   1                       # LineInfo section string offset=1
> +            .long   2
> +            .long   .Ltmp0
> +            .long   7
> +            .long   33
> +            .long   7182                    # Line 7 Col 14
> +            .long   .Ltmp3
> +            .long   7
> +            .long   58
> +            .long   8206                    # Line 8 Col 14
> +
> +7. Testing
> +**********
> +
> +Kernel bpf selftest `test_btf.c` provides extensive set of BTF related tests.
> diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
> index 00a8450a602f..4e77932959cc 100644
> --- a/Documentation/bpf/index.rst
> +++ b/Documentation/bpf/index.rst
> @@ -15,6 +15,13 @@ that goes into great technical depth about the BPF Architecture.
>  The primary info for the bpf syscall is available in the `man-pages`_
>  for `bpf(2)`_.
>  
> +BPF Type Format (BTF)
> +=====================
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   btf
>  
>  
>  Frequently asked questions (FAQ)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ