[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1545259460-13376-1-git-send-email-jiong.wang@netronome.com>
Date: Wed, 19 Dec 2018 17:44:07 -0500
From: Jiong Wang <jiong.wang@...ronome.com>
To: ast@...nel.org, daniel@...earbox.net
Cc: netdev@...r.kernel.org, oss-drivers@...ronome.com,
Jiong Wang <jiong.wang@...ronome.com>,
"David S . Miller" <davem@...emloft.net>,
Paul Burton <paul.burton@...s.com>,
Wang YanQing <udknight@...il.com>,
Zi Shen Lim <zlim.lnx@...il.com>,
Shubham Bansal <illusionist.neo@...il.com>,
"Naveen N . Rao" <naveen.n.rao@...ux.ibm.com>,
Sandipan Das <sandipan@...ux.ibm.com>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Heiko Carstens <heiko.carstens@...ibm.com>
Subject: [PATH bpf-next 00/13] bpf: propose new jmp32 instructions
Current eBPF ISA has 32-bit sub-register and has defined a set of ALU32
instructions.
However, there is no JMP32 instructions, the consequence is code-gen for
32-bit sub-registers is not efficient. For example, explicit sign-extension
from 32-bit to 64-bit is needed for signed comparison.
Adding JMP32 instruction therefore could complete eBPF ISA on 32-bit
sub-register support. This also match those JMP32 instructions in most JIT
backends, for example x64-64 and AArch64. These new eBPF JMP32 instructions
could have one-to-one map on them.
A few verifier ALU32 related bugs has been fixed recently, and JMP32
introduced by this set further improves BPF sub-register ecosystem. Once
this is landed, BPF programs using 32-bit sub-register ISA could get
reasonably good support from verifier and JIT compilers. Users then could
compare the runtime efficiency of one BPF program under both modes, and
could use the one benchmarked as better. One good thing is JMP32 is making
32-bit JIT more efficient, because it only has 32-bit use, no def, so
unlike ALU32, no need to clear high bits. Hence, even without data-flow
analysis, JMP32 is making better code-gen then JMP64. More benchmark
results are listed below in this cover letter.
- Encoding
Ideally, JMP32 could use new CLASS BPF_JMP32, just like BPF_ALU and
BPF_ALU32. But we only has one class number 0x06 unused. I am not sure
if we want to keep it for other extension purpose. For example restore
it as BPF_MISC which could then redefine the interpretation of all the
remaining bits in bis[7:1];
So, I am following the coding style used by BPF_PSEUDO_CALL, that is to
use reserved bits under BPF_JMP. When BPF_SRC(code) == BPF_X, the
encoding is 0x1 at insn->imm. When BPF_SRC(code) == BPF_K, the encoding
is 0x1 at insn->src_reg. All other bits in imm and src_reg are still
reserved and should be zeroed.
- Testing
A couple of unit tests has been added and included in this set. Also
LLVM code-gen for JMP32 has been added, so you could just compile any
BPF C program with both -mcpu=probe and -mattr=+alu32 specified if
you are compiling on a machine with kernel patched by this set, LLVM
will select the ISA automatically based on host probe results.
Otherwise specify -mcpu=v3 and -mattr=+alu32 to force use JMP32 ISA
and enable sub-register code-gen.
LLVM support could be found at:
https://github.com/Netronome/llvm/commit/607f088b92ebfb09f026a84a9443a59237cf6628
(will send out merge request once kernel set reached consensus.
Hopefully could get into LLVM 8.0 which will be branched at
16-Jan-2019)
I have compiled BPF selftest with JMP32 enabled. The methodology is
BPF selftest Makefile has introduced a new variable "BPF_SELFTEST_32BIT"
which allows BPF C programs contained inside the testsuite compiled
using sub-register mode for which ALU32 and JMP32 instructions will be
generated once the kernel installed on the compilation machine support
them. From my tests, no regression on this sub-register test mode except
when loading bpf_flow.o which somehow verifier doesn't reason the pkt
range accurately. test_progs which contains quite a few BPF C tests
passed cleanly.
Using an env variable to control test mode seems bring smallest change to
the Makefile, and would require "make check" with BPF_SELFTEST_32BIT
defined in your test driver script for this new test mode.
Would appreicate if any better idea on how to enable extra test mode for
BPF selftests.
- JIT backends support
A couple of JIT backends has been supported in this set except SPARC
and MIPS which I need maintainer's help on implementing them.
@David, @Paul, would appreciate if you could help on this.
Also those implemented in this set needs port maintainer's review and
tests. I have only tested x86_64 and NFP.
- Benchmarking
Below are some benchmark results from Cilium BPF programs. After JMP32
enabled, we could see consistently code size reduction and processed
instruction numbers are reduced in general as well.
Text size in bytes (generated by "size")
===
LLVM code-gen option default alu32 alu32 + jmp32 change
(Vs. alu32)
bpf_lb-DLB_L3.o: 6456 6280 6160 -1.91%
bpf_lb-DLB_L4.o: 7848 7664 7136 -6.89%
bpf_lb-DUNKNOWN.o: 2680 2664 2568 -3.60%
bpf_lxc.o: 104824 104744 97360 -7.05%
bpf_netdev.o: 23456 23576 21632 -8.25%
bpf_overlay.o: 16184 16304 14648 -10.16%
Processed insn number
===
LLVM code-gen option default alu32 alu32 + jmp32 change
bpf_lb-DLB_L3.o: 1579 1281 1304 +1.79%
bpf_lb-DLB_L4.o: 2045 1663 1554 -6.55%
bpf_lb-DUNKNOWN.o: 606 513 505 -1.56%
bpf_lxc.o: 85381 103218 102666 -0.53%
bpf_netdev.o: 5246 5809 5376 -7.45%
bpf_overlay.o: 2443 2705 2460 -9.05%
JITed insn num (on NFP, other 32-bit arches could be similar)
===
LLVM code-gen option default alu32 alu32 + jmp32 change
(Vs. alu32)
one ~300 line C program 632 612 597 -2.45%
(NFP contains some fixed sequence, so the real improvements is higher)
Thanks.
Cc: David S. Miller <davem@...emloft.net>
Cc: Paul Burton <paul.burton@...s.com>
Cc: Wang YanQing <udknight@...il.com>
Cc: Zi Shen Lim <zlim.lnx@...il.com>
Cc: Shubham Bansal <illusionist.neo@...il.com>
Cc: Naveen N. Rao <naveen.n.rao@...ux.ibm.com>
Cc: Sandipan Das <sandipan@...ux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@...ibm.com>
Cc: Heiko Carstens <heiko.carstens@...ibm.com>
Jiong Wang (13):
bpf: encoding description and macros for JMP32
bpf: interpreter support for JMP32
bpf: JIT blinds support JMP32
x86_64: bpf: implement jitting of JMP32
x32: bpf: implement jitting of JMP32
arm64: bpf: implement jitting of JMP32
arm: bpf: implement jitting of JMP32
ppc: bpf: implement jitting of JMP32
s390: bpf: implement jitting of JMP32
nfp: bpf: implement jitting of JMP32
bpf: verifier support JMP32
bpf: unit tests for JMP32
selftests: bpf: makefile support sub-register code-gen test mode
Documentation/networking/filter.txt | 10 +
arch/arm/net/bpf_jit_32.c | 23 +-
arch/arm64/net/bpf_jit_comp.c | 10 +-
arch/powerpc/net/bpf_jit_comp64.c | 50 ++++-
arch/s390/net/bpf_jit_comp.c | 12 +-
arch/x86/net/bpf_jit_comp.c | 13 +-
arch/x86/net/bpf_jit_comp32.c | 46 ++--
drivers/net/ethernet/netronome/nfp/bpf/jit.c | 69 ++++--
include/linux/filter.h | 19 ++
include/uapi/linux/bpf.h | 4 +
kernel/bpf/core.c | 60 +++--
kernel/bpf/verifier.c | 178 +++++++++++----
lib/test_bpf.c | 321 ++++++++++++++++++++++++++-
tools/include/uapi/linux/bpf.h | 4 +
tools/testing/selftests/bpf/Makefile | 4 +
15 files changed, 696 insertions(+), 127 deletions(-)
--
2.7.4
Powered by blists - more mailing lists