lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <X84R5DttN3WuHDYo@google.com>
Date:   Mon, 7 Dec 2020 11:28:36 +0000
From:   Brendan Jackman <jackmanb@...gle.com>
To:     Yonghong Song <yhs@...com>
Cc:     bpf@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        KP Singh <kpsingh@...omium.org>,
        Florent Revest <revest@...omium.org>,
        linux-kernel@...r.kernel.org, Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH bpf-next v3 10/14] bpf: Add bitwise atomic instructions

On Fri, Dec 04, 2020 at 07:21:22AM -0800, Yonghong Song wrote:
> 
> 
> On 12/4/20 1:36 AM, Brendan Jackman wrote:
> > On Thu, Dec 03, 2020 at 10:42:19PM -0800, Yonghong Song wrote:
> > > 
> > > 
> > > On 12/3/20 8:02 AM, Brendan Jackman wrote:
> > > > This adds instructions for
> > > > 
> > > > atomic[64]_[fetch_]and
> > > > atomic[64]_[fetch_]or
> > > > atomic[64]_[fetch_]xor
> > > > 
> > > > All these operations are isomorphic enough to implement with the same
> > > > verifier, interpreter, and x86 JIT code, hence being a single commit.
> > > > 
> > > > The main interesting thing here is that x86 doesn't directly support
> > > > the fetch_ version these operations, so we need to generate a CMPXCHG
> > > > loop in the JIT. This requires the use of two temporary registers,
> > > > IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose.
> > > > 
> > > > Change-Id: I340b10cecebea8cb8a52e3606010cde547a10ed4
> > > > Signed-off-by: Brendan Jackman <jackmanb@...gle.com>
> > > > ---
> > > >    arch/x86/net/bpf_jit_comp.c  | 50 +++++++++++++++++++++++++++++-
> > > >    include/linux/filter.h       | 60 ++++++++++++++++++++++++++++++++++++
> > > >    kernel/bpf/core.c            |  5 ++-
> > > >    kernel/bpf/disasm.c          | 21 ++++++++++---
> > > >    kernel/bpf/verifier.c        |  6 ++++
> > > >    tools/include/linux/filter.h | 60 ++++++++++++++++++++++++++++++++++++
> > > >    6 files changed, 196 insertions(+), 6 deletions(-)
> > > > 
> > [...]
> > > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > > index 6186280715ed..698f82897b0d 100644
> > > > --- a/include/linux/filter.h
> > > > +++ b/include/linux/filter.h
> > > > @@ -280,6 +280,66 @@ static inline bool insn_is_zext(const struct bpf_insn *insn)
> > [...]
> > > > +#define BPF_ATOMIC_FETCH_XOR(SIZE, DST, SRC, OFF)		\
> > > > +	((struct bpf_insn) {					\
> > > > +		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC,	\
> > > > +		.dst_reg = DST,					\
> > > > +		.src_reg = SRC,					\
> > > > +		.off   = OFF,					\
> > > > +		.imm   = BPF_XOR | BPF_FETCH })
> > > > +
> > > >    /* Atomic exchange, src_reg = atomic_xchg((dst_reg + off), src_reg) */
> > > 
> > > Looks like BPF_ATOMIC_XOR/OR/AND/... all similar to each other.
> > > The same is for BPF_ATOMIC_FETCH_XOR/OR/AND/...
> > > 
> > > I am wondering whether it makes sence to have to
> > > BPF_ATOMIC_BOP(BOP, SIZE, DST, SRC, OFF) and
> > > BPF_ATOMIC_FETCH_BOP(BOP, SIZE, DST, SRC, OFF)
> > > can have less number of macros?
> > 
> > Hmm yeah I think that's probably a good idea, it would be consistent
> > with the macros for non-atomic ALU ops.
> > 
> > I don't think 'BOP' would be very clear though, 'ALU' might be more
> > obvious.
> 
> BPF_ATOMIC_ALU and BPF_ATOMIC_FETCH_ALU indeed better.

On second thoughts I think it feels right (i.e. it would be roughly
consistent with the level of abstraction of the rest of this macro API)
to go further and just have two macros BPF_ATOMIC64 and BPF_ATOMIC32:

	/*
	 * Atomic ALU ops:
	 *
	 *   BPF_ADD                  *(uint *) (dst_reg + off16) += src_reg
	 *   BPF_AND                  *(uint *) (dst_reg + off16) &= src_reg
	 *   BPF_OR                   *(uint *) (dst_reg + off16) |= src_reg
	 *   BPF_XOR                  *(uint *) (dst_reg + off16) ^= src_reg
	 *   BPF_ADD | BPF_FETCH      src_reg = atomic_fetch_add(dst_reg + off16, src_reg);
	 *   BPF_AND | BPF_FETCH      src_reg = atomic_fetch_and(dst_reg + off16, src_reg);
	 *   BPF_OR | BPF_FETCH       src_reg = atomic_fetch_or(dst_reg + off16, src_reg);
	 *   BPF_XOR | BPF_FETCH      src_reg = atomic_fetch_xor(dst_reg + off16, src_reg);
	 *   BPF_XCHG                 src_reg = atomic_xchg(dst_reg + off16, src_reg)
	 *   BPF_CMPXCHG              r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg)
	 */

	#define BPF_ATOMIC64(OP, DST, SRC, OFF)                         \
		((struct bpf_insn) {                                    \
			.code  = BPF_STX | BPF_DW | BPF_ATOMIC,         \
			.dst_reg = DST,                                 \
			.src_reg = SRC,                                 \
			.off   = OFF,                                   \
			.imm   = OP })

	#define BPF_ATOMIC32(OP, DST, SRC, OFF)                         \
		((struct bpf_insn) {                                    \
			.code  = BPF_STX | BPF_W | BPF_ATOMIC,         \
			.dst_reg = DST,                                 \
			.src_reg = SRC,                                 \
			.off   = OFF,                                   \
			.imm   = OP })

The downside compared to what's currently in the patchset is that the
user can write e.g. BPF_ATOMIC64(BPF_SUB, BPF_REG_1, BPF_REG_2, 0) and
it will compile. On the other hand they'll get a pretty clear
"BPF_ATOMIC uses invalid atomic opcode 10" when they try to load the
prog, and the valid atomic ops are clearly listed in Documentation as
well as the comments here.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ