[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bza3s4u5kX3AFDWd6-JGjfkhwfakc8_AKH52L7517Q8QGQ@mail.gmail.com>
Date: Fri, 5 Apr 2024 11:10:41 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Puranjay Mohan <puranjay12@...il.com>, Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, Zi Shen Lim <zlim.lnx@...il.com>, Xu Kuohai <xukuohai@...wei.com>,
Florent Revest <revest@...omium.org>,
linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>, LKML <linux-kernel@...r.kernel.org>,
bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next] arm64, bpf: add internal-only MOV instruction to
resolve per-CPU addrs
On Fri, Apr 5, 2024 at 8:48 AM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Fri, Apr 5, 2024 at 2:17 AM Puranjay Mohan <puranjay12@...il.com> wrote:
> >
> > Support an instruction for resolving absolute addresses of per-CPU
> > data from their per-CPU offsets. This instruction is internal-only and
> > users are not allowed to use them directly. They will only be used for
> > internal inlining optimizations for now between BPF verifier and BPF
> > JITs.
> >
> > Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
> > access using tpidr_el1"), the per-cpu offset for the CPU is stored in
> > the tpidr_el1/2 register of that CPU.
> >
> > To support this BPF instruction in the ARM64 JIT, the following ARM64
> > instructions are emitted:
> >
> > mov dst, src // Move src to dst, if src != dst
> > mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp.
> > add dst, dst, tmp // Add the per cpu offset to the dst.
> >
> > If CONFIG_SMP is not defined, then nothing is emitted if src == dst, and
> > mov dst, src is emitted if dst != src.
> >
> > To measure the performance improvement provided by this change, the
> > benchmark in [1] was used:
> >
> > Before:
> > glob-arr-inc : 23.597 ± 0.012M/s
> > arr-inc : 23.173 ± 0.019M/s
> > hash-inc : 12.186 ± 0.028M/s
> >
> > After:
> > glob-arr-inc : 23.819 ± 0.034M/s
> > arr-inc : 23.285 ± 0.017M/s
> > hash-inc : 12.419 ± 0.011M/s
> >
> > [1] https://github.com/anakryiko/linux/commit/8dec900975ef
>
> You don't see as big of a gain, because bpf_get_smp_processor_id()
> is not inlined yet on arm64.
>
yep, would be nice to add ARM64 and RISC-V support there as well.
Though it feels that supporting this in BPF JIT directly might be
actually easier for RISC-V/ARM64, not sure?
> But even without it I expected bigger gains.
> Could you do 'perf report' before/after ?
> Just want to see what's on top.
I also did `bpftool p d x id <progid>` and `bpftool p d j id <progid>`
to validate expected inlined BPF instructions and jitted code. So it
might be a good idea to do that as well.
Either way, thanks for working on this!
>
> >
> > Signed-off-by: Puranjay Mohan <puranjay12@...il.com>
> > ---
> > arch/arm64/include/asm/insn.h | 7 +++++++
> > arch/arm64/lib/insn.c | 11 +++++++++++
> > arch/arm64/net/bpf_jit.h | 6 ++++++
> > arch/arm64/net/bpf_jit_comp.c | 16 ++++++++++++++++
> > 4 files changed, 40 insertions(+)
> >
[...]
Powered by blists - more mailing lists