[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YoH6yAtmzPQtWiFM@FVFF77S0Q05N>
Date: Mon, 16 May 2022 08:18:32 +0100
From: Mark Rutland <mark.rutland@....com>
To: Xu Kuohai <xukuohai@...wei.com>
Cc: bpf@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
linux-kselftest@...r.kernel.org,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...hat.com>,
Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>,
Zi Shen Lim <zlim.lnx@...il.com>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
"David S . Miller" <davem@...emloft.net>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
David Ahern <dsahern@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
hpa@...or.com, Shuah Khan <shuah@...nel.org>,
Jakub Kicinski <kuba@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Ard Biesheuvel <ardb@...nel.org>,
Daniel Kiss <daniel.kiss@....com>,
Steven Price <steven.price@....com>,
Sudeep Holla <sudeep.holla@....com>,
Marc Zyngier <maz@...nel.org>,
Peter Collingbourne <pcc@...gle.com>,
Mark Brown <broonie@...nel.org>,
Delyan Kratunov <delyank@...com>,
Kumar Kartikeya Dwivedi <memxor@...il.com>
Subject: Re: [PATCH bpf-next v3 4/7] bpf, arm64: Impelment
bpf_arch_text_poke() for arm64
On Mon, May 16, 2022 at 02:55:46PM +0800, Xu Kuohai wrote:
> On 5/13/2022 10:59 PM, Mark Rutland wrote:
> > On Sun, Apr 24, 2022 at 11:40:25AM -0400, Xu Kuohai wrote:
> >> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
> >> it to replace nop with jump, or replace jump with nop.
> >>
> >> Signed-off-by: Xu Kuohai <xukuohai@...wei.com>
> >> Acked-by: Song Liu <songliubraving@...com>
> >> ---
> >> arch/arm64/net/bpf_jit_comp.c | 63 +++++++++++++++++++++++++++++++++++
> >> 1 file changed, 63 insertions(+)
> >>
> >> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> >> index 8ab4035dea27..3f9bdfec54c4 100644
> >> --- a/arch/arm64/net/bpf_jit_comp.c
> >> +++ b/arch/arm64/net/bpf_jit_comp.c
> >> @@ -9,6 +9,7 @@
> >>
> >> #include <linux/bitfield.h>
> >> #include <linux/bpf.h>
> >> +#include <linux/memory.h>
> >> #include <linux/filter.h>
> >> #include <linux/printk.h>
> >> #include <linux/slab.h>
> >> @@ -18,6 +19,7 @@
> >> #include <asm/cacheflush.h>
> >> #include <asm/debug-monitors.h>
> >> #include <asm/insn.h>
> >> +#include <asm/patching.h>
> >> #include <asm/set_memory.h>
> >>
> >> #include "bpf_jit.h"
> >> @@ -1529,3 +1531,64 @@ void bpf_jit_free_exec(void *addr)
> >> {
> >> return vfree(addr);
> >> }
> >> +
> >> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
> >> + void *addr, u32 *insn)
> >> +{
> >> + if (!addr)
> >> + *insn = aarch64_insn_gen_nop();
> >> + else
> >> + *insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
> >> + (unsigned long)addr,
> >> + type);
> >> +
> >> + return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
> >> +}
> >> +
> >> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
> >> + void *old_addr, void *new_addr)
> >> +{
> >> + int ret;
> >> + u32 old_insn;
> >> + u32 new_insn;
> >> + u32 replaced;
> >> + enum aarch64_insn_branch_type branch_type;
> >> +
> >> + if (!is_bpf_text_address((long)ip))
> >> + /* Only poking bpf text is supported. Since kernel function
> >> + * entry is set up by ftrace, we reply on ftrace to poke kernel
> >> + * functions. For kernel funcitons, bpf_arch_text_poke() is only
> >> + * called after a failed poke with ftrace. In this case, there
> >> + * is probably something wrong with fentry, so there is nothing
> >> + * we can do here. See register_fentry, unregister_fentry and
> >> + * modify_fentry for details.
> >> + */
> >> + return -EINVAL;
> >
> > If you rely on ftrace to poke functions, why do you need to patch text
> > at all? Why does the rest of this function exist?
> >
> > I really don't like having another piece of code outside of ftrace
> > patching the ftrace patch-site; this needs a much better explanation.
> >
>
> Sorry for the incorrect explaination in the comment. I don't think it's
> reasonable to patch ftrace patch-site without ftrace code either.
>
> The patching logic in register_fentry, unregister_fentry and
> modify_fentry is as follows:
>
> if (tr->func.ftrace_managed)
> ret = register_ftrace_direct((long)ip, (long)new_addr);
> else
> ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr,
> true);
>
> ftrace patch-site is patched by ftrace code. bpf_arch_text_poke() is
> only used to patch bpf prog and bpf trampoline, which are not managed by
> ftrace.
Sorry, I had misunderstood. Thanks for the correction!
I'll have another look with that in mind.
> >> +
> >> + if (poke_type == BPF_MOD_CALL)
> >> + branch_type = AARCH64_INSN_BRANCH_LINK;
> >> + else
> >> + branch_type = AARCH64_INSN_BRANCH_NOLINK;
> >> +
> >> + if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
> >> + return -EFAULT;
> >> +
> >> + if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
> >> + return -EFAULT;
> >> +
> >> + mutex_lock(&text_mutex);
> >> + if (aarch64_insn_read(ip, &replaced)) {
> >> + ret = -EFAULT;
> >> + goto out;
> >> + }
> >> +
> >> + if (replaced != old_insn) {
> >> + ret = -EFAULT;
> >> + goto out;
> >> + }
> >> +
> >> + ret = aarch64_insn_patch_text_nosync((void *)ip, new_insn);
> >
> > ... and where does the actual synchronization come from in this case?
>
> aarch64_insn_patch_text_nosync() replaces an instruction atomically, so
> no other CPUs will fetch a half-new and half-old instruction.
>
> The scenario here is that there is a chance that another CPU fetches the
> old instruction after bpf_arch_text_poke() finishes, that is, different
> CPUs may execute different versions of instructions at the same time.
>
> 1. When a new trampoline is attached, it doesn't seem to be an issue for
> different CPUs to jump to different trampolines temporarily.
>
> 2. When an old trampoline is freed, we should wait for all other CPUs to
> exit the trampoline and make sure the trampoline is no longer reachable,
> IIUC, bpf_tramp_image_put() function already uses percpu_ref and rcu
> tasks to do this.
It would be good to have a comment for these points.
Thanks,
Mark.
Powered by blists - more mailing lists