linux-kernel - Re: [PATCH RFC bpf-next v1 2/4] bpf: Introduce load-acquire and store-release instructions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e6641ce-3f1e-4251-8daf-4dd4b77d08c4@huaweicloud.com>
Date: Mon, 30 Dec 2024 16:27:21 +0800
From: Xu Kuohai <xukuohai@...weicloud.com>
To: Peilin Ye <yepeilin@...gle.com>
Cc: bpf@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
 Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
 Yonghong Song <yonghong.song@...ux.dev>,
 Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
 Martin KaFai Lau <martin.lau@...ux.dev>,
 John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
 Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
 Jiri Olsa <jolsa@...nel.org>, "Paul E. McKenney" <paulmck@...nel.org>,
 Puranjay Mohan <puranjay@...nel.org>,
 Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 Quentin Monnet <qmo@...nel.org>, Mykola Lysenko <mykolal@...com>,
 Shuah Khan <shuah@...nel.org>, Josh Don <joshdon@...gle.com>,
 Barret Rhoden <brho@...gle.com>, Neel Natu <neelnatu@...gle.com>,
 Benjamin Segall <bsegall@...gle.com>, David Vernet <dvernet@...a.com>,
 Dave Marchevsky <davemarchevsky@...a.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC bpf-next v1 2/4] bpf: Introduce load-acquire and
 store-release instructions

On 12/27/2024 7:07 AM, Peilin Ye wrote:
> Hi Xu,
> 
> Thanks for reviewing this!
> 
> On Tue, Dec 24, 2024 at 06:07:14PM +0800, Xu Kuohai wrote:
>> On 12/21/2024 9:25 AM, Peilin Ye wrote:
>>> +__AARCH64_INSN_FUNCS(load_acq,  0x3FC08000, 0x08C08000)
>>> +__AARCH64_INSN_FUNCS(store_rel, 0x3FC08000, 0x08808000)
>>
>> I checked Arm Architecture Reference Manual [1].
>>
>> Section C6.2.{168,169,170,371,372,373} state that field Rt2 (bits 10-14) and
>> Rs (bits 16-20) for LDARB/LDARH/LDAR/STLRB/STLRH and no offset type STLR
>> instructions are fixed to (1).
>>
>> Section C2.2.2 explains that (1) means a Should-Be-One (SBO) bit.
>>
>> And the Glossary section says "Arm strongly recommends that software writes
>> the field as all 1s. If software writes a value that is not all 1s, it must
>> expect an UNPREDICTABLE or CONSTRAINED UNPREDICTABLE result."
>>
>> Although the pre-index type of STLR is an excetpion, it is not used in this
>> series. Therefore, both bits 10-14 and 16-20 in mask and value should be set
>> to 1s.
>>
>> [1] https://developer.arm.com/documentation/ddi0487/latest/
> 
> <...>
> 
>>> +	insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT2, insn,
>>> +					    AARCH64_INSN_REG_ZR);
>>> +
>>> +	return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RS, insn,
>>> +					    AARCH64_INSN_REG_ZR);
>>
>> As explained above, RS and RT2 fields should be fixed to 1s.
> 
> I'm already setting Rs and Rt2 to all 1's here, as AARCH64_INSN_REG_ZR
> is defined as 31 (0b11111):
> 
> 	AARCH64_INSN_REG_ZR = 31,
> 

I see, but the setting of fixed bits is smomewhat of a waste of jit time.

> Similar to how load- and store-exclusive instructions are handled
> currently:
> 
>>>    __AARCH64_INSN_FUNCS(load_ex,	0x3F400000, 0x08400000)
>>>    __AARCH64_INSN_FUNCS(store_ex,	0x3F400000, 0x08000000)
> 
> For example, in the manual, Rs is all (1)'s for LDXR{,B,H}, and Rt2 is
> all (1)'s for both LDXR{,B,H} and STXR{,B,H}.  However, neither Rs nor
> Rt2 bits are in the mask, and (1) bits are set manually, see
> aarch64_insn_gen_load_store_ex():
> 
>    insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT2, insn,
>                                        AARCH64_INSN_REG_ZR);
> 
>    return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RS, insn,
>                                        state);
> 
> (For LDXR{,B,H}, 'state' is A64_ZR, which is just an alias to
> AARCH64_INSN_REG_ZR (0b11111).)
>
> - - -
> 
> On a related note, I simply grabbed {load,store}_ex's MASK and VALUE,
> then set their 15th and 23rd bits to make them load-acquire and
> store-release:
> 
>    +__AARCH64_INSN_FUNCS(load_acq,  0x3FC08000, 0x08C08000)
>    +__AARCH64_INSN_FUNCS(store_rel, 0x3FC08000, 0x08808000)
>     __AARCH64_INSN_FUNCS(load_ex,   0x3F400000, 0x08400000)
>     __AARCH64_INSN_FUNCS(store_ex,  0x3F400000, 0x08000000)
> 
> My question is, should we extend {load,store}_ex's MASK to make them
> contain BIT(15) and BIT(23) as well?  As-is, aarch64_insn_is_load_ex()
> would return true for a load-acquire.
> 
> The only user of aarch64_insn_is_load_ex() seems to be this
> arm64-specific kprobe code in arch/arm64/kernel/probes/decode-insn.c:
> 
>    #ifdef CONFIG_KPROBES
>    static bool __kprobes
>    is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end)
>    {
>            while (scan_start >= scan_end) {
>                    /*
>                     * atomic region starts from exclusive load and ends with
>                     * exclusive store.
>                     */
>                    if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
>                            return false;
>                    else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start)))
>                            return true;
> 
> But I'm not sure yet if changing {load,store}_ex's MASK would affect the
> above code.  Do you happen to know the context?
> 

IIUC, this code prevents kprobe from interrupting the LL-SC loop constructed
by LDXR/STXR pair, as the kprobe trap causes unexpected memory access that
prevents the exclusive memory access loop from exiting.

Since load-acquire/store-release instructions are not used to construct LL-SC
loop, I think it is safe to exclude them from {load,store}_ex.

>>> +	if (BPF_ATOMIC_TYPE(insn->imm) == BPF_ATOMIC_LOAD)
>>> +		ptr = src;
>>> +	else
>>> +		ptr = dst;
>>> +
>>> +	if (off) {
>>> +		emit_a64_mov_i(true, tmp, off, ctx);
>>> +		emit(A64_ADD(true, tmp, tmp, ptr), ctx);
>>
>> The mov and add instructions can be optimized to a single A64_ADD_I
>> if is_addsub_imm(off) is true.
> 
> Thanks!  I'll try this.
> 
>> I think it's better to split the arm64 related changes into two separate
>> patches: one for adding the arm64 LDAR/STLR instruction encodings, and
>> the other for adding jit support.
> 
> Got it, in the next version I'll split this patch into (a) core/verifier
> changes, (b) arm64 insn.{h,c} changes, and (c) arm64 JIT compiler
> support.
>
> Thanks,
> Peilin Ye