[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6169a2e0-ee95-58ec-5e96-7562e070e99a@fb.com>
Date: Sat, 2 Mar 2019 00:23:13 +0000
From: Yonghong Song <yhs@...com>
To: Daniel Borkmann <daniel@...earbox.net>,
Stanislav Fomichev <sdf@...ichev.me>
CC: Alexei Starovoitov <ast@...com>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"joe@...d.net.nz" <joe@...d.net.nz>,
"john.fastabend@...il.com" <john.fastabend@...il.com>,
"tgraf@...g.ch" <tgraf@...g.ch>, Andrii Nakryiko <andriin@...com>,
"jakub.kicinski@...ronome.com" <jakub.kicinski@...ronome.com>,
"lmb@...udflare.com" <lmb@...udflare.com>
Subject: Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global
data/bss/rodata sections
On 2/28/19 4:19 PM, Daniel Borkmann wrote:
> On 03/01/2019 12:41 AM, Stanislav Fomichev wrote:
>> On 03/01, Daniel Borkmann wrote:
>>> This work adds BPF loader support for global data sections
>>> to libbpf. This allows to write BPF programs in more natural
>>> C-like way by being able to define global variables and const
>>> data.
>>>
>>> Back at LPC 2018 [0] we presented a first prototype which
>>> implemented support for global data sections by extending BPF
>>> syscall where union bpf_attr would get additional memory/size
>>> pair for each section passed during prog load in order to later
>>> add this base address into the ldimm64 instruction along with
>>> the user provided offset when accessing a variable. Consensus
>>> from LPC was that for proper upstream support, it would be
>>> more desirable to use maps instead of bpf_attr extension as
>>> this would allow for introspection of these sections as well
>>> as potential life updates of their content. This work follows
>>> this path by taking the following steps from loader side:
>>>
>>> 1) In bpf_object__elf_collect() step we pick up ".data",
>>> ".rodata", and ".bss" section information.
>>>
>>> 2) If present, in bpf_object__init_global_maps() we create
>>> a map that corresponds to each of the present sections.
>>> Given section size and access properties can differ, a
>>> single entry array map is created with value size that
>>> is corresponding to the ELF section size of .data, .bss
>>> or .rodata. In the latter case, the map is created as
>>> read-only from program side such that verifier rejects
>>> any write attempts into .rodata. In a subsequent step,
>>> for .data and .rodata sections, the section content is
>>> copied into the map through bpf_map_update_elem(). For
>>> .bss this is not necessary since array map is already
>>> zero-initialized by default.
>>>
>>> 3) In bpf_program__collect_reloc() step, we record the
>>> corresponding map, insn index, and relocation type for
>>> the global data.
>>>
>>> 4) And last but not least in the actual relocation step in
>>> bpf_program__relocate(), we mark the ldimm64 instruction
>>> with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>>> imm field the map's file descriptor is stored as similarly
>>> done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>>> (as ldimm64 is 2-insn wide) we store the access offset
>>> into the section.
>>>
>>> 5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>>> load will then store the actual target address in order
>>> to have a 'map-lookup'-free access. That is, the actual
>>> map value base address + offset. The destination register
>>> in the verifier will then be marked as PTR_TO_MAP_VALUE,
>>> containing the fixed offset as reg->off and backing BPF
>>> map as reg->map_ptr. Meaning, it's treated as any other
>>> normal map value from verification side, only with
>>> efficient, direct value access instead of actual call to
>>> map lookup helper as in the typical case.
>>>
>>> Simple example dump of program using globals vars in each
>>> section:
>>>
>>> # readelf -a test_global_data.o
>>> [...]
>>> [ 6] .bss NOBITS 0000000000000000 00000328
>>> 0000000000000010 0000000000000000 WA 0 0 8
>>> [ 7] .data PROGBITS 0000000000000000 00000328
>>> 0000000000000010 0000000000000000 WA 0 0 8
>>> [ 8] .rodata PROGBITS 0000000000000000 00000338
>>> 0000000000000018 0000000000000000 A 0 0 8
>>> [...]
>>> 95: 0000000000000000 8 OBJECT LOCAL DEFAULT 6 static_bss
>>> 96: 0000000000000008 8 OBJECT LOCAL DEFAULT 6 static_bss2
>>> 97: 0000000000000000 8 OBJECT LOCAL DEFAULT 7 static_data
>>> 98: 0000000000000008 8 OBJECT LOCAL DEFAULT 7 static_data2
>>> 99: 0000000000000000 8 OBJECT LOCAL DEFAULT 8 static_rodata
>>> 100: 0000000000000008 8 OBJECT LOCAL DEFAULT 8 static_rodata2
>>> 101: 0000000000000010 8 OBJECT LOCAL DEFAULT 8 static_rodata3
>>> [...]
>>>
>>> # bpftool prog
>>> 103: sched_cls name load_static_dat tag 37a8b6822fc39a29 gpl
>>> loaded_at 2019-02-28T02:02:35+0000 uid 0
>>> xlated 712B jited 426B memlock 4096B map_ids 63,64,65,66
>>> # bpftool map show id 63
>>> 63: array name .bss flags 0x0 <-- .bss area, rw
>> Can we use <main prog>.bss/data/rodata names? If we load more than one
>> prog with global data that should make it easier to find which one is which.
>
> Yeah that's fine, we can change it. They could potentially also be shared,
> so <main prog>.bss/data/rodata might be misleading, but <obj>.bss/data/rodata
> could be.
Note the map_name field only 16 bytes (excluding ending '\0', only 15
bytes). If <obj> file has a long name like test_verifier.o, you may have
to shorten the <obj> part of the name.
>
> Thanks,
> Daniel
>
Powered by blists - more mailing lists