[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ+HfNh2csyH2xZtGFXW1zwBEW4+bo_E60PWPydJkB6zZTVx3A@mail.gmail.com>
Date: Tue, 4 Feb 2020 20:30:23 +0100
From: Björn Töpel <bjorn.topel@...il.com>
To: Palmer Dabbelt <palmerdabbelt@...gle.com>
Cc: Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>, zlim.lnx@...il.com,
catalin.marinas@....com, will@...nel.org,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
Andrii Nakryiko <andriin@...com>,
Shuah Khan <shuah@...nel.org>, Netdev <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, linux-arm-kernel@...ts.infradead.org,
LKML <linux-kernel@...r.kernel.org>,
linux-kselftest@...r.kernel.org,
clang-built-linux@...glegroups.com, kernel-team@...roid.com
Subject: Re: arm64: bpf: Elide some moves to a0 after calls
On Tue, 28 Jan 2020 at 03:14, Palmer Dabbelt <palmerdabbelt@...gle.com> wrote:
>
> There's four patches here, but only one of them actually does anything. The
> first patch fixes a BPF selftests build failure on my machine and has already
> been sent to the list separately. The next three are just staged such that
> there are some patches that avoid changing any functionality pulled out from
> the whole point of those refactorings, with two cleanups and then the idea.
>
> Maybe this is an odd thing to say in a cover letter, but I'm not actually sure
> this patch set is a good idea. The issue of extra moves after calls came up as
> I was reviewing some unrelated performance optimizations to the RISC-V BPF JIT.
> I figured I'd take a whack at performing the optimization in the context of the
> arm64 port just to get a breath of fresh air, and I'm not convinced I like the
> results.
>
> That said, I think I would accept something like this for the RISC-V port
> because we're already doing a multi-pass optimization for shrinking function
> addresses so it's not as much extra complexity over there. If we do that we
> should probably start puling some of this code into the shared BPF compiler,
> but we're also opening the doors to more complicated BPF JIT optimizations.
> Given that the BPF JIT appears to have been designed explicitly to be
> simple/fast as opposed to perform complex optimization, I'm not sure this is a
> sane way to move forward.
>
Obviously I can only speak for myself and the RISC-V JIT, but given
that we already have opened the door for more advanced translations
(branch relaxation e.g.), I think that this makes sense. At the same
time we don't want to go all JVM on the JITs. :-P
> I figured I'd send the patch set out as more of a question than anything else.
> Specifically:
>
> * How should I go about measuring the performance of these sort of
> optimizations? I'd like to balance the time it takes to run the JIT with the
> time spent executing the program, but I don't have any feel for what real BPF
> programs look like or have any benchmark suite to run. Is there something
> out there this should be benchmarked against? (I'd also like to know that to
> run those benchmarks on the RISC-V port.)
If you run the selftests 'test_progs' with -v it'll measure/print the
execution time of the programs. I'd say *most* BPF program invokes a
helper (via call). It would be interesting to see, for say the
selftests, how often the optimization can be performed.
> * Is this the sort of thing that makes sense in a BPF JIT? I guess I've just
> realized I turned "review this patch" into a way bigger rabbit hole than I
> really want to go down...
>
I'd say 'yes'. My hunch, and the workloads I've seen, BPF programs are
usually loaded, and then resident for a long time. So, the JIT time is
not super critical. The FB/Cilium folks can definitely provide a
better sample point, than my hunch. ;-)
Björn
Powered by blists - more mailing lists