[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z23hntYzWuZOnScP@google.com>
Date: Thu, 26 Dec 2024 23:07:10 +0000
From: Peilin Ye <yepeilin@...gle.com>
To: Xu Kuohai <xukuohai@...weicloud.com>
Cc: bpf@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
"Paul E. McKenney" <paulmck@...nel.org>,
Puranjay Mohan <puranjay@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Quentin Monnet <qmo@...nel.org>,
Mykola Lysenko <mykolal@...com>, Shuah Khan <shuah@...nel.org>,
Josh Don <joshdon@...gle.com>, Barret Rhoden <brho@...gle.com>,
Neel Natu <neelnatu@...gle.com>,
Benjamin Segall <bsegall@...gle.com>,
David Vernet <dvernet@...a.com>,
Dave Marchevsky <davemarchevsky@...a.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC bpf-next v1 2/4] bpf: Introduce load-acquire and
store-release instructions
Hi Xu,
Thanks for reviewing this!
On Tue, Dec 24, 2024 at 06:07:14PM +0800, Xu Kuohai wrote:
> On 12/21/2024 9:25 AM, Peilin Ye wrote:
> > +__AARCH64_INSN_FUNCS(load_acq, 0x3FC08000, 0x08C08000)
> > +__AARCH64_INSN_FUNCS(store_rel, 0x3FC08000, 0x08808000)
>
> I checked Arm Architecture Reference Manual [1].
>
> Section C6.2.{168,169,170,371,372,373} state that field Rt2 (bits 10-14) and
> Rs (bits 16-20) for LDARB/LDARH/LDAR/STLRB/STLRH and no offset type STLR
> instructions are fixed to (1).
>
> Section C2.2.2 explains that (1) means a Should-Be-One (SBO) bit.
>
> And the Glossary section says "Arm strongly recommends that software writes
> the field as all 1s. If software writes a value that is not all 1s, it must
> expect an UNPREDICTABLE or CONSTRAINED UNPREDICTABLE result."
>
> Although the pre-index type of STLR is an excetpion, it is not used in this
> series. Therefore, both bits 10-14 and 16-20 in mask and value should be set
> to 1s.
>
> [1] https://developer.arm.com/documentation/ddi0487/latest/
<...>
> > + insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT2, insn,
> > + AARCH64_INSN_REG_ZR);
> > +
> > + return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RS, insn,
> > + AARCH64_INSN_REG_ZR);
>
> As explained above, RS and RT2 fields should be fixed to 1s.
I'm already setting Rs and Rt2 to all 1's here, as AARCH64_INSN_REG_ZR
is defined as 31 (0b11111):
AARCH64_INSN_REG_ZR = 31,
Similar to how load- and store-exclusive instructions are handled
currently:
> > __AARCH64_INSN_FUNCS(load_ex, 0x3F400000, 0x08400000)
> > __AARCH64_INSN_FUNCS(store_ex, 0x3F400000, 0x08000000)
For example, in the manual, Rs is all (1)'s for LDXR{,B,H}, and Rt2 is
all (1)'s for both LDXR{,B,H} and STXR{,B,H}. However, neither Rs nor
Rt2 bits are in the mask, and (1) bits are set manually, see
aarch64_insn_gen_load_store_ex():
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT2, insn,
AARCH64_INSN_REG_ZR);
return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RS, insn,
state);
(For LDXR{,B,H}, 'state' is A64_ZR, which is just an alias to
AARCH64_INSN_REG_ZR (0b11111).)
- - -
On a related note, I simply grabbed {load,store}_ex's MASK and VALUE,
then set their 15th and 23rd bits to make them load-acquire and
store-release:
+__AARCH64_INSN_FUNCS(load_acq, 0x3FC08000, 0x08C08000)
+__AARCH64_INSN_FUNCS(store_rel, 0x3FC08000, 0x08808000)
__AARCH64_INSN_FUNCS(load_ex, 0x3F400000, 0x08400000)
__AARCH64_INSN_FUNCS(store_ex, 0x3F400000, 0x08000000)
My question is, should we extend {load,store}_ex's MASK to make them
contain BIT(15) and BIT(23) as well? As-is, aarch64_insn_is_load_ex()
would return true for a load-acquire.
The only user of aarch64_insn_is_load_ex() seems to be this
arm64-specific kprobe code in arch/arm64/kernel/probes/decode-insn.c:
#ifdef CONFIG_KPROBES
static bool __kprobes
is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end)
{
while (scan_start >= scan_end) {
/*
* atomic region starts from exclusive load and ends with
* exclusive store.
*/
if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
return false;
else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start)))
return true;
But I'm not sure yet if changing {load,store}_ex's MASK would affect the
above code. Do you happen to know the context?
> > + if (BPF_ATOMIC_TYPE(insn->imm) == BPF_ATOMIC_LOAD)
> > + ptr = src;
> > + else
> > + ptr = dst;
> > +
> > + if (off) {
> > + emit_a64_mov_i(true, tmp, off, ctx);
> > + emit(A64_ADD(true, tmp, tmp, ptr), ctx);
>
> The mov and add instructions can be optimized to a single A64_ADD_I
> if is_addsub_imm(off) is true.
Thanks! I'll try this.
> I think it's better to split the arm64 related changes into two separate
> patches: one for adding the arm64 LDAR/STLR instruction encodings, and
> the other for adding jit support.
Got it, in the next version I'll split this patch into (a) core/verifier
changes, (b) arm64 insn.{h,c} changes, and (c) arm64 JIT compiler
support.
Thanks,
Peilin Ye
Powered by blists - more mailing lists