[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55AFB2E0.5060307@iogearbox.net>
Date: Wed, 22 Jul 2015 17:12:32 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
CC: Alexei Starovoitov <ast@...mgrid.com>,
Silvan Jegen <s.jegen@...il.com>, linux-man@...r.kernel.org,
linux-kernel@...r.kernel.org, Walter Harms <wharms@....de>
Subject: Re: Edited draft of bpf(2) man page for review/enhancement
On 07/22/2015 04:49 PM, Michael Kerrisk (man-pages) wrote:
> Hi Daniel,
>
> Sorry for the long delay in following up....
No worries, eBPF is quite some material. ;)
> On 05/27/2015 11:26 AM, Daniel Borkmann wrote:
>> On 05/27/2015 10:43 AM, Michael Kerrisk (man-pages) wrote:
>>> Hello Alexei,
>>>
>>> I took the draft 3 of the bpf(2) man page that you sent back in March
>>> and did some substantial editing to clarify the language and add a
>>> few technical details. Could you please check the revised version
>>> below, to ensure I did not inject any errors.
>>>
>>> I also added a number of FIXMEs for pieces of the page that need
>>> further work. Could you take a look at these and let me know your
>>> thoughts, please.
>>
>> That's great, thanks! Minor comments:
>>
>> ...
>>> .TH BPF 2 2015-03-10 "Linux" "Linux Programmer's Manual"
>>> .SH NAME
>>> bpf - perform a command on an extended BPF map or program
>>> .SH SYNOPSIS
>>> .nf
>>> .B #include <linux/bpf.h>
>>> .sp
>>> .BI "int bpf(int cmd, union bpf_attr *attr, unsigned int size);
>>>
>>> .SH DESCRIPTION
>>> The
>>> .BR bpf ()
>>> system call performs a range of operations related to extended
>>> Berkeley Packet Filters.
>>> Extended BPF (or eBPF) is similar to
>>> the original BPF (or classic BPF) used to filter network packets.
>>> For both BPF and eBPF programs,
>>> the kernel statically analyzes the programs before loading them,
>>> in order to ensure that they cannot harm the running system.
>>> .P
>>> eBPF extends classic BPF in multiple ways including the ability to call
>>> in-kernel helper functions (via the
>>> .B BPF_CALL
>>> opcode extension provided by eBPF)
>>> and access shared data structures such as BPF maps.
>>
>> I would perhaps emphasize that maps can be shared among in-kernel
>> eBPF programs, but also between kernel and user space.
>
> This is covered later in the page, under the "BPF maps" subheading.
> Maybe you missed that? (Or did you think it doesn't suffice?)
Okay, I presume you mean:
Maps are a generic data structure for storage of different types
and sharing data between the kernel and user-space programs.
Maybe, to emphasize both options a bit (not sure if it's better in
my words, though):
Maps are a generic data structure for storage of different types
and allow for sharing data among eBPF kernel programs, but also
between kernel and user-space applications.
>>> The programs can be written in a restricted C that is compiled into
>>> .\" FIXME In the next line, what is "a restricted C"? Where does
>>> .\" one get further information about it?
>>
>> So far only from the kernel samples directory and for tc classifier
>> and action, from the tc man page and/or examples/bpf/ in the tc git
>> tree.
>
> So, given that we are several weeks down the track, and things may have
> changed, I'll re-ask the questions ;-) :
>
> * Is this restricted C documented anywhere?
Not (yet) that I'm aware of. We were thinking that short-mid term
to polish the stuff that resides in the kernel documentation, that
is, Documentation/networking/filter.txt, to get it in a better
shape, which I presume, would also include a documentation on the
restricted C. So far, examples are provided in the tc-bpf man page
(see link below).
The set of available helper functions callable from eBPF resides
under (enum bpf_func_id):
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/bpf.h
> * Is the procedure for compiling this restricted C documented anywhere?
> (Yes, it's LLVM, but are the suitable pipelines/options documented
> somewhere?)
>
>>> eBPF bytecode and executed on the in-kernel virtual machine or
>>> just-in-time compiled into native code.
>>> .SS Extended BPF Design/Architecture
>>> .P
>>> .\" FIXME In the following line, what does "different data types" mean?
>>> .\" Are the values in a map not just blobs?
>>
>> Sort of, currently, these blobs can have different sizes of keys
>> and values (you can even have structs as keys). For the map itself
>> they are treated as blob internally. However, recently, bpf tail call
>> got added where you can lookup another program from an array map and
>> call into it. Here, that particular type of map can only have entries
>> of type of eBPF program fd. I think, if needed, adding a paragraph to
>> the tail call could be done as follow-up after we have an initial man
>> page in the tree included.
>
> Okay -- I've added a FIXME placeholder for this, so we can revisit.
Okay.
>>> BPF maps are a generic data structure for storage of different data types.
>>> A user process can create multiple maps (with key/value-pairs being
>>> opaque bytes of data) and access them via file descriptors.
>>> BPF programs can access maps from inside the kernel in parallel.
>>> It's up to the user process and BPF program to decide what they store
>>> inside maps.
>>> .P
>>> BPF programs are similar to kernel modules.
>>> They are loaded by the user
>>> process and automatically unloaded when the process exits.
>>
>> Generally that's true. Btw, in 4.1 kernel, tc(8) also got support for
>> eBPF classifier and actions, and here it's slightly different: in tc,
>> we load the programs, maps etc, and push down the eBPF program fd in
>> order to let the kernel hold reference on the program itself.
>>
>> Thus, there, the program fd that the application owns is gone when the
>> application terminates, but the eBPF program itself still lives on
>> inside the kernel. But perhaps it's already too much detail to mention
>> here ...
>
> Well, it should be documented somewhere....
Yep, fwiw some time ago I've hacked together a man page for tc:
https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=cbdd1e6921d21815e35d2a96526cfbad5ac98e09
>>> Each BPF program is a set of instructions that is safe to run until
>>> its completion.
>>> The in-kernel BPF verifier statically determines that the program
>>> terminates and is safe to execute.
>>> .\" FIXME In the following sentence, what does "takes hold" mean?
>>
>> Takes a reference. Meaning, that maps cannot disappear under us while
>> the eBPF program that is using them in the kernel is still alive.
>
> So, I changed this to:
>
> [[
> During verification, the kernel increments reference counts for each of
> the maps that the eBPF program uses,
> so that the selected maps cannot be removed until the program is unloaded.
> ]]
>
> Okay?
Okay.
[...]
> I'll send out a new draft soon, but in the meantime hopefully you
> or Alexei might have a chance to answer some open questions (see my
> other mail to Alexei, which will be sent soon), so I can further edit
> the page before sending it out.
Later on, we should also add a paragraph on eBPF tail calls, but one
step at a time.
Thanks again,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists