[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1734742802.git.yepeilin@google.com>
Date: Sat, 21 Dec 2024 01:22:04 +0000
From: Peilin Ye <yepeilin@...gle.com>
To: bpf@...r.kernel.org
Cc: Peilin Ye <yepeilin@...gle.com>, Alexei Starovoitov <ast@...nel.org>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
"Paul E. McKenney" <paulmck@...nel.org>, Puranjay Mohan <puranjay@...nel.org>,
Xu Kuohai <xukuohai@...weicloud.com>, Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Quentin Monnet <qmo@...nel.org>, Mykola Lysenko <mykolal@...com>,
Shuah Khan <shuah@...nel.org>, Josh Don <joshdon@...gle.com>, Barret Rhoden <brho@...gle.com>,
Neel Natu <neelnatu@...gle.com>, Benjamin Segall <bsegall@...gle.com>,
David Vernet <dvernet@...a.com>, Dave Marchevsky <davemarchevsky@...a.com>, linux-kernel@...r.kernel.org
Subject: [PATCH RFC bpf-next v1 0/4] Introduce load-acquire and store-release
BPF instructions
Hi all!
This RFC patchset adds kernel support for BPF load-acquire and store-release
instructions (for background, please see [1]). Currently only arm64 is
supported for RFC. The corresponding LLVM changes can be found at:
https://github.com/llvm/llvm-project/pull/108636
As discussed on GitHub [2], define both load-acquire and store-release as
BPF_STX | BPF_ATOMIC instructions. The following new flags are introduced:
BPF_ATOMIC_LOAD 0x10
BPF_ATOMIC_STORE 0x20
BPF_RELAXED 0x0
BPF_ACQUIRE 0x1
BPF_RELEASE 0x2
BPF_ACQ_REL 0x3
BPF_SEQ_CST 0x4
BPF_LOAD_ACQ (BPF_ATOMIC_LOAD | BPF_ACQUIRE)
BPF_STORE_REL (BPF_ATOMIC_STORE | BPF_RELEASE)
Bit 4-7 of 'imm' encodes the new atomic operations (load and store), and bit
0-3 specifies the memory order. A load-acquire is a BPF_STX | BPF_ATOMIC
instruction with 'imm' set to BPF_LOAD_ACQ (0x11). Similarly, a store-release
is a BPF_STX | BPF_ATOMIC instruction with 'imm' set to BPF_STORE_REL (0x22).
For bit 4-7 of 'imm' we need to avoid conflicts with existing
BPF_STX | BPF_ATOMIC instructions. Currently the following values (a subset
of BPFArithOp<>) are in use:
def BPF_ADD : BPFArithOp<0x0>;
def BPF_OR : BPFArithOp<0x4>;
def BPF_AND : BPFArithOp<0x5>;
def BPF_XOR : BPFArithOp<0xa>;
def BPF_XCHG : BPFArithOp<0xe>;
def BPF_CMPXCHG : BPFArithOp<0xf>;
0x1 and 0x2 were chosen for the new instructions because:
* BPFArithOp<0x1> is BPF_SUB. Compilers already handle atomic subtraction
by generating a BPF NEG followed by a BPF ADD instruction.
* BPFArithOp<0x2> is BPF_MUL, and we do not have a plan for adding BPF
atomic multiplication instructions.
So we think by choosing 0x1 and 0x2, we can avoid having conflicts with
BPFArithOp<> in the future. Previously 0xb was chosen because we will never
need BPF_MOV (BPFArithOp<0xb>) for BPF_ATOMIC. Please suggest if you think
different values should be used.
Based on [3], the BPF load-acquire, the arm64 JIT compiler generates LDAR
(RCsc) instead of LDAPR (RCpc). Will Deacon also suggested LDAR over LDAPR in
an offlist conversation for the following reasons:
a. Not all CPUs support LDAPR, as also pointed out in Paul E. McKenney's
email (search for "older ARM64 hardware" in [3]).
b. The extra ordering provided by RCsc is important in some use cases e.g.
locks.
c. The arm64 ISA does not provide e.g. other atomic memory operations in
RCpc. In other words, it is not worth losing the extra ordering that
LDAR provides, if we would still be using RCsc for all other cases.
Unlike existing atomic operations that only support BPF_W (32-bit) and
BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire
zero-extends the value before writing it to a 32-bit register, just like
LDARH and friends.
Examples of using the new instructions (assuming little-endian):
long foo(long *ptr) {
return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
}
Using clang -mcpu=v4, foo() can be compiled to:
db 10 00 00 11 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
95 00 00 00 00 00 00 00 exit
opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
imm (0x00000011): BPF_LOAD_ACQ
For arm64, an LDAR instruction would be generated by the JIT compiler for
the above, e.g.:
ldar x7, [x0]
Similarly, consider this 16-bit store-release:
void bar(short *ptr, short val) {
__atomic_store_n(ptr, val, __ATOMIC_RELEASE);
}
bar() can be compiled to (again, using clang -mcpu=v4):
cb 21 00 00 22 00 00 00 store_release((u16 *)(r1 + 0x0), w2)
95 00 00 00 00 00 00 00 exit
opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
imm (0x00000022): BPF_ATOMIC_STORE | BPF_RELEASE
An STLRH will be generated for it, e.g.:
stlrh w1, [x0]
For a complete mapping for ARM64:
load-acquire 8-bit LDARB
(BPF_LOAD_ACQ) 16-bit LDARH
32-bit LDAR (32-bit)
64-bit LDAR (64-bit)
store-release 8-bit STLRB
(BPF_STORE_REL) 16-bit STLRH
32-bit STLR (32-bit)
64-bit STLR (64-bit)
Using in arena is supported. Inline assembly is also supported. For example:
asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" :
"=r"(ret) : "r"(ptr) : "memory");
A new pre-defined macro, __BPF_FEATURE_LOAD_ACQ_STORE_REL, can be used to
detect if clang supports BPF load-acquire and store-release.
Please refer to individual kernel patches (and LLVM commits) for details.
Any suggestions or corrections would be much appreciated!
[1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
[2] https://github.com/llvm/llvm-project/pull/108636#issuecomment-2389403477
[3] https://lore.kernel.org/bpf/75d1352e-c05e-4fdf-96bf-b1c3daaf41f0@paulmck-laptop/
Thanks,
Peilin Ye (4):
bpf/verifier: Factor out check_load()
bpf: Introduce load-acquire and store-release instructions
selftests/bpf: Delete duplicate verifier/atomic_invalid tests
selftests/bpf: Add selftests for load-acquire and store-release
instructions
arch/arm64/include/asm/insn.h | 8 ++
arch/arm64/lib/insn.c | 34 +++++++
arch/arm64/net/bpf_jit.h | 20 +++++
arch/arm64/net/bpf_jit_comp.c | 85 +++++++++++++++++-
include/linux/filter.h | 2 +
include/uapi/linux/bpf.h | 13 +++
kernel/bpf/core.c | 41 ++++++++-
kernel/bpf/disasm.c | 14 +++
kernel/bpf/verifier.c | 88 ++++++++++++-------
tools/include/uapi/linux/bpf.h | 13 +++
.../selftests/bpf/prog_tests/arena_atomics.c | 61 ++++++++++++-
.../selftests/bpf/prog_tests/atomics.c | 57 +++++++++++-
.../selftests/bpf/progs/arena_atomics.c | 62 ++++++++++++-
tools/testing/selftests/bpf/progs/atomics.c | 62 ++++++++++++-
.../selftests/bpf/verifier/atomic_invalid.c | 28 +++---
.../selftests/bpf/verifier/atomic_load.c | 71 +++++++++++++++
.../selftests/bpf/verifier/atomic_store.c | 70 +++++++++++++++
17 files changed, 672 insertions(+), 57 deletions(-)
create mode 100644 tools/testing/selftests/bpf/verifier/atomic_load.c
create mode 100644 tools/testing/selftests/bpf/verifier/atomic_store.c
--
2.47.1.613.gc27f4b7a9f-goog
Powered by blists - more mailing lists