>From 15472b7be6c619e5eb35037da70b35d066544838 Mon Sep 17 00:00:00 2001 From: "Eric S. Raymond" Date: Tue, 12 Jun 2018 22:32:24 -0400 Subject: [PATCH 1/5] Clean up markup in tc-bpf.8 This page had multiple issues: 1. .in +4/.nf....fi/.in was used where .EX/.EE was called for. 2. .SS and running text shouldn't have been used in Synopsis section. Inline text has been moved to PARAMETERS. 3. Under PATAMETERS, adjacent .SS tags could not be lifted to XML. .TP is now used in that section instead. Signed-off-by: Eric S. Raymond --- man/man8/tc-bpf.8 | 239 +++++++++++++++++++++++----------------------- 1 file changed, 117 insertions(+), 122 deletions(-) diff --git a/man/man8/tc-bpf.8 b/man/man8/tc-bpf.8 index d311f295..f883a912 100644 --- a/man/man8/tc-bpf.8 +++ b/man/man8/tc-bpf.8 @@ -3,7 +3,6 @@ BPF \- BPF programmable classifier and actions for ingress/egress queueing disciplines .SH SYNOPSIS -.SS eBPF classifier (filter) or action: .B tc filter ... bpf [ .B object-file @@ -28,7 +27,7 @@ POLICE_SPEC ] [ ACTION_SPEC ] [ .B classid CLASSID ] -.br + .B tc action ... bpf [ .B object-file @@ -40,7 +39,6 @@ UDS_FILE ] [ .B verbose ] -.SS cBPF classifier (filter) or action: .B tc filter ... bpf [ .B bytecode-file @@ -53,7 +51,7 @@ POLICE_SPEC ] [ ACTION_SPEC ] [ .B classid CLASSID ] -.br + .B tc action ... bpf [ .B bytecode-file @@ -110,7 +108,9 @@ are pushed into one map and use another one for dynamically load balancing traffic based on the determined load, just to provide a few examples. .SH PARAMETERS -.SS object-file +The first pair of filter/action invocations is for eBPF, the second for cBPF. +.TP +object-file points to an object file that has an executable and linkable format (ELF) and contains eBPF opcodes and eBPF map definitions. The LLVM compiler infrastructure with @@ -120,16 +120,16 @@ files that can be passed to the eBPF classifier (more details in the .B EXAMPLES section). This option is mandatory when an eBPF classifier or action is to be loaded. - -.SS section +.TP +section is the name of the ELF section from the object file, where the eBPF classifier or action resides. By default the section name for the classifier is called "classifier", and for the action "action". Given that a single object file can contain multiple classifier and actions, the corresponding section name needs to be specified, if it differs from the defaults. - -.SS export +.TP +export points to a Unix domain socket file. In case the eBPF object file also contains a section named "maps" with eBPF map specifications, then the map file descriptors can be handed off via the Unix domain socket to @@ -139,18 +139,18 @@ import, that uses them for calling into .B bpf(2) system call to read out or update eBPF map data from user space, for example, for monitoring purposes or to push down new policies. - -.SS verbose +.TP +verbose if set, it will dump the eBPF verifier output, even if loading the eBPF program was successful. By default, only on error, the verifier log is being emitted to the user. - -.SS direct-action | da +.TP +direct-action | da instructs eBPF classifier to not invoke external TC actions, instead use the TC actions return codes (\fBTC_ACT_OK\fR, \fBTC_ACT_SHOT\fR etc.) for classifiers. - -.SS skip_hw | skip_sw +.TP +skip_hw | skip_sw hardware offload control flags. By default TC will try to offload filters to hardware if possible. .B skip_hw @@ -159,21 +159,22 @@ explicitly disables the attempt to offload. forces the offload and disables running the eBPF program in the kernel. If hardware offload is not possible and this flag was set kernel will report an error and filter will not be installed at all. - -.SS police +.TP +police is an optional parameter for an eBPF/cBPF classifier that specifies a police in .B tc(1) which is attached to the classifier, for example, on an ingress qdisc. - -.SS action +.TP +action is an optional parameter for an eBPF/cBPF classifier that specifies a subsequent action in .B tc(1) which is attached to a classifier. - -.SS classid -.SS flowid +.TP +classid +.TP +flowid provides the default traffic control class identifier for this eBPF/cBPF classifier. The default class identifier can also be overwritten by the return code of the eBPF/cBPF program. A default return code of @@ -184,8 +185,8 @@ a return code other than these two will override the default classid. This allows for efficient, non-linear classification with only a single eBPF/cBPF program as opposed to having multiple individual programs for various class identifiers which would need to reparse packet contents. - -.SS bytecode +.TP +bytecode is being used for loading cBPF classifier and actions only. The cBPF bytecode is directly passed as a text string in the form of .B \'s,c t f k,c t f k,c t f k,...\' @@ -211,8 +212,8 @@ that ships with the Linux kernel source tree under or .B bytecode-file option is mandatory when a cBPF classifier or action is to be loaded. - -.SS bytecode-file +.TP +bytecode-file also being used to load a cBPF classifier or action. It's effectively the same as .B bytecode @@ -224,7 +225,7 @@ rather resides in a text file. A full blown example including eBPF agent code can be found inside the iproute2 source package under: .B examples/bpf/ - +.sp As prerequisites, the kernel needs to have the eBPF system call namely .B bpf(2) enabled and ships with @@ -234,9 +235,11 @@ and kernel modules for the traffic control subsystem. To enable eBPF/eBPF JIT support, depending which of the two the given architecture supports: -.in +4n +.RS +.EX .B echo 1 > /proc/sys/net/core/bpf_jit_enable -.in +.EE +.RE A given restricted C file can be compiled via LLVM as: @@ -247,24 +250,24 @@ A given restricted C file can be compiled via LLVM as: The compiler invocation might still simplify in future, so for now, it's quite handy to alias this construct in one way or another, for example: -.in +4n -.nf -.sp +.RS +.EX + __bcc() { clang -O2 -emit-llvm -c $1 -o - | \\ llc -march=bpf -filetype=obj -o "`basename $1 .c`.o" } alias bcc=__bcc -.fi -.in +.EE +.RE A minimal, stand-alone unit, which matches on all traffic with the default classid (return code of -1) looks like: -.in +4n -.nf -.sp +.RS +.EX + #include #ifndef __section @@ -277,8 +280,8 @@ __section("classifier") int cls_main(struct __sk_buff *skb) } char __license[] __section("license") = "GPL"; -.fi -.in +.EE +.RE More examples can be found further below in subsection .B eBPF PROGRAMMING @@ -299,9 +302,9 @@ example .B objdump(1) for inspecting ELF section headers: -.in +4n -.nf -.sp +.RS +.EX + objdump -h bpf.o [...] 3 classifier 000007f8 0000000000000000 0000000000000000 00000040 2**3 @@ -315,56 +318,56 @@ objdump -h bpf.o 7 license 00000004 0000000000000000 0000000000000000 00000988 2**0 CONTENTS, ALLOC, LOAD, DATA [...] -.fi -.in +.EE +.RE Adding an eBPF classifier from an object file that contains a classifier in the default ELF section is trivial (note that instead of "object-file" also shortcuts such as "obj" can be used): -.in +4n +.RS .B bcc bpf.c .br .B tc filter add dev em1 parent 1: bpf obj bpf.o flowid 1:1 -.in +.RE In case the classifier resides in ELF section "mycls", then that same command needs to be invoked as: -.in +4n +.RS .B tc filter add dev em1 parent 1: bpf obj bpf.o sec mycls flowid 1:1 -.in +.RE Dumping the classifier configuration will tell the location of the classifier, in other words that it's from object file "bpf.o" under section "mycls": -.in +4n +.RS .B tc filter show dev em1 .br .B filter parent 1: protocol all pref 49152 bpf .br .B filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid 1:1 bpf.o:[mycls] -.in +.RE The same program can also be installed on ingress qdisc side as opposed to egress ... -.in +4n +.RS .B tc qdisc add dev em1 handle ffff: ingress .br .B tc filter add dev em1 parent ffff: bpf obj bpf.o sec mycls flowid ffff:1 -.in +.RE \&... and again dumped from there: -.in +4n +.RS .B tc filter show dev em1 parent ffff: .br .B filter protocol all pref 49152 bpf .br .B filter protocol all pref 49152 bpf handle 0x1 flowid ffff:1 bpf.o:[mycls] -.in +.RE Attaching a classifier and action on ingress has the restriction that it doesn't have an actual underlying queueing discipline. What ingress @@ -382,15 +385,13 @@ object file within various sections. In that case, non-default section names must be provided, which is the case for both actions in this example: -.in +4n -.B tc filter add dev em1 parent 1: bpf obj bpf.o flowid 1:1 \e -.br -.in +25n -.B action bpf obj bpf.o sec action-mark \e -.br -.B action bpf obj bpf.o sec action-rand ok -.in -25n -.in -4n +.RS +.EX +tc filter add dev em1 parent 1: bpf obj bpf.o flowid 1:1 \e + action bpf obj bpf.o sec action-mark \e + action bpf obj bpf.o sec action-rand ok +.EE +.RE The advantage of this is that the classifier and the two actions can then share eBPF maps with each other, if implemented in the programs. @@ -421,17 +422,14 @@ this fd-owner shell, they can terminate and restart without losing eBPF maps file descriptors. Example invocation with the previous classifier and action mixture: -.in +4n -.B tc exec bpf imp /tmp/bpf -.br -.B tc filter add dev em1 parent 1: bpf obj bpf.o exp /tmp/bpf flowid 1:1 \e -.br -.in +25n -.B action bpf obj bpf.o sec action-mark \e -.br -.B action bpf obj bpf.o sec action-rand ok -.in -25n -.in -4n +.RS +.EX +tc exec bpf imp /tmp/bpf +tc filter add dev em1 parent 1: bpf obj bpf.o exp /tmp/bpf flowid 1:1 \e + action bpf obj bpf.o sec action-mark \e + action bpf obj bpf.o sec action-rand ok +.EE +.RE Assuming that eBPF maps are shared with classifier and actions, it's enough to export them once, for example, from within the classifier @@ -454,9 +452,8 @@ member of The environment in this example looks as follows: -.in +4n -.nf -.sp +.RS +.EX sh# env | grep BPF BPF_NUM_MAPS=3 BPF_MAP1=6 @@ -468,8 +465,8 @@ sh# ls -la /proc/self/fd lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map sh# my_bpf_agent -.fi -.in +.EE +.RE eBPF agents are very useful in that they can prepopulate eBPF maps from user space, monitor statistics via maps and based on that feedback, for @@ -495,7 +492,7 @@ from the iproute2 source package for a fully fledged flow dissector example to better demonstrate some of the possibilities with eBPF. Supported 32 bit classifier return codes from the C program and their meanings: -.in +4n +.RS .B 0 , denotes a mismatch .br @@ -505,12 +502,12 @@ Supported 32 bit classifier return codes from the C program and their meanings: .B else , everything else will override the default classid to provide a facility for non-linear matching -.in +.RE Supported 32 bit action return codes from the C program and their meanings ( .B linux/pkt_cls.h ): -.in +4n +.RS .B TC_ACT_OK (0) , will terminate the packet processing pipeline and allows the packet to proceed @@ -532,7 +529,7 @@ from the beginning .br .B else , everything else is an unspecified return code -.in +.RE Both classifier and action return codes are supported in eBPF and cBPF programs. @@ -543,9 +540,8 @@ from a container, have previously been marked in interval [0, 255]. The program keeps statistics on different marks for user space and maps the classid to the root qdisc with the marking itself as the minor handle: -.in +4n -.nf -.sp +.RS +.EX #include #include @@ -595,17 +591,17 @@ __section("cls") int cls_main(struct __sk_buff *skb) } char __license[] __section("license") = "GPL"; -.fi -.in +.EE +.RE Another small example is a port redirector which demuxes destination port 80 into the interval [8080, 8087] steered by RSS, that can then be attached to ingress qdisc. The exercise of adding the egress counterpart and IPv6 support is left to the reader: -.in +4n -.nf -.sp +.RS +.EX + #include #include @@ -664,16 +660,15 @@ __section("lb") int lb_main(struct __sk_buff *skb) } char __license[] __section("license") = "GPL"; -.fi -.in +.EE +.RE The related helper header file .B helpers.h in both examples was: -.in +4n -.nf -.sp +.RS +.EX /* Misc helper macros. */ #define __section(x) __attribute__((section(x), used)) #define offsetof(x, y) __builtin_offsetof(x, y) @@ -704,8 +699,8 @@ unsigned long long load_byte(void *skb, unsigned long long off) asm ("llvm.bpf.load.byte"); unsigned long long load_half(void *skb, unsigned long long off) asm ("llvm.bpf.load.half"); -.fi -.in +.EE +.RE Best practice, we recommend to only have a single eBPF classifier loaded in tc and perform @@ -733,9 +728,11 @@ the kernel log, which can be read via .B dmesg(1) : -.in +4n +.RS +.EX .B echo 2 > /proc/sys/net/core/bpf_jit_enable -.in +.EE +.RE The Linux kernel source tree ships additionally under .B tools/net/ @@ -744,18 +741,18 @@ a small helper called that reads out the opcode image dump from the kernel log and dumps the resulting disassembly: -.in +4n +.RS .B bpf_jit_disasm -o -.in +.RE Other than that, the Linux kernel also contains an extensive eBPF/cBPF test suite module called .B test_bpf \&. Upon ... -.in +4n +.RS .B modprobe test_bpf -.in +.RE \&... it performs a diversity of test cases and dumps the results into the kernel log that can be inspected with @@ -786,9 +783,9 @@ The raw interface with tc takes opcodes directly. For example, the most minimal classifier matching on every packet resulting in the default classid of 1:1 looks like: -.in +4n +.RS .B tc filter add dev em1 parent 1: bpf bytecode '1,6 0 0 4294967295,' flowid 1:1 -.in +.RE The first decimal of the bytecode sequence denotes the number of subsequent 4-tuples of cBPF opcodes. As mentioned, such a 4-tuple consists of @@ -813,9 +810,8 @@ internal classic BPF compiler, his code derived here for usage with .B tc(8) : -.in +4n -.nf -.sp +.RS +.EX #include #include @@ -850,25 +846,25 @@ int main(int argc, char **argv) pcap_freecode(&prog); return 0; } -.fi -.in +.EE +.RE Given this small helper, any .B tcpdump(8) filter expression can be abused as a classifier where a match will result in the default classid: -.in +4n +.RS .B bpftool EN10MB 'tcp[tcpflags] & tcp-syn != 0' > /var/bpf/tcp-syn .br .B tc filter add dev em1 parent 1: bpf bytecode-file /var/bpf/tcp-syn flowid 1:1 -.in +.RE Basically, such a minimal generator is equivalent to: -.in +4n +.RS .B tcpdump -iem1 -ddd 'tcp[tcpflags] & tcp-syn != 0' | tr '\\\\n' ',' > /var/bpf/tcp-syn -.in +.RE Since .B libpcap @@ -888,25 +884,24 @@ for classifying IPv4/TCP packets, saved in a text file called .B foobar : -.in +4n -.nf -.sp +.RS +.EX ldh [12] jne #0x800, drop ldb [23] jneq #6, drop ret #-1 drop: ret #0 -.fi -.in +.EE +.RE Similarly, such a classifier can be loaded as: -.in +4n +.RS .B bpf_asm foobar > /var/bpf/tcp-syn .br .B tc filter add dev em1 parent 1: bpf bytecode-file /var/bpf/tcp-syn flowid 1:1 -.in +.RE For BPF classifiers, the Linux kernel provides additionally under .B tools/net/ -- 2.17.1