[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM1=_QSKa7W9SL7oXWGEHLtWqCeFWp-jtGoqPp9=MxQwUGOjaQ@mail.gmail.com>
Date: Sun, 1 Aug 2021 10:37:55 +0200
From: Johan Almbladh <johan.almbladh@...finetworks.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Yonghong Song <yhs@...com>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Tony Ambardar <Tony.Ambardar@...il.com>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH] bpf: Fix off-by-one in tail call count limiting
On Fri, Jul 30, 2021 at 12:48 AM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> On Thu, Jul 29, 2021 at 3:29 PM Andrii Nakryiko
> <andrii.nakryiko@...il.com> wrote:
> >
> > On Thu, Jul 29, 2021 at 2:38 PM Johan Almbladh
> > <johan.almbladh@...finetworks.com> wrote:
> > >
> > > On Wed, Jul 28, 2021 at 9:13 PM Yonghong Song <yhs@...com> wrote:
> > > > I also checked arm/arm64 jit. I saw the following comments:
> > > >
> > > > /* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
> > > > * goto out;
> > > > * tail_call_cnt++;
> > > > */
> > > >
> > > > Maybe we have this MAX_TAIL_CALL_CNT + 1 issue
> > > > for arm/arm64 jit?
> > >
> > > That wouldn't be unreasonable. I don't have an arm or arm64 setup
> > > available right now, but I can try to test it in qemu.
> >
> > On a brief check, there seems to be quite a mess in terms of the code
> > and comments.
> >
> > E.g., in arch/x86/net/bpf_jit_comp32.c:
> >
> > /*
> > * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
> > * goto out;
> > */
> >
> > ^^^^ here comment is wrong
> >
> > [...]
> >
> > /* cmp edx,hi */
> > EMIT3(0x83, add_1reg(0xF8, IA32_EBX), hi);
> > EMIT2(IA32_JNE, 3);
> > /* cmp ecx,lo */
> > EMIT3(0x83, add_1reg(0xF8, IA32_ECX), lo);
> >
> > /* ja out */
> > EMIT2(IA32_JAE, jmp_label(jmp_label1, 2));
> >
> > ^^^ JAE is >=, right? But the comment says JA.
> >
> >
> > As for arch/x86/net/bpf_jit_comp.c, both comment and the code seem to
> > do > MAX_TAIL_CALL_CNT, but you are saying JIT is correct. What am I
> > missing?
> >
> > Can you please check all the places where MAX_TAIL_CALL_CNT is used
> > throughout the code? Let's clean this up in one go.
> >
> > Also, given it's so easy to do this off-by-one error, can you please
> > add a negative test validating that 33 tail calls are not allowed? I
> > assume we have a positive test that allows exactly MAX_TAIL_CALL_CNT,
> > but please double-check that as well.
>
> Ok, I see that you've added this in your bpf tests patch set. Please
> consider, additionally, implementing a similar test as part of
> selftests/bpf (specifically in test_progs). We run test_progs
> continuously in CI for every incoming patch/patchset, so it has much
> higher chances of capturing any regressions.
>
> I'm also thinking that this MAX_TAIL_CALL_CNT change should probably
> go into the bpf-next tree. First, this off-by-one behavior was around
> for a while and it doesn't cause serious issues, even if abused. But
> on the other hand, it will make your tail call tests fail, when
> applied into bpf-next without your change. So I think we should apply
> both into bpf-next.
I can confirm that the off-by-one behaviour is present on arm. Below
is the test output running on qemu. Test #4 calls itself recursively
and increments a counter each time, so the correct result should be 1
+ MAX_TAIL_CALL_CNT.
test_bpf: #0 Tail call leaf jited:1 71 PASS
test_bpf: #1 Tail call 2 jited:1 134 PASS
test_bpf: #2 Tail call 3 jited:1 164 PASS
test_bpf: #3 Tail call 4 jited:1 257 PASS
test_bpf: #4 Tail call error path, max count reached jited:1 ret 34 != 33 FAIL
test_bpf: #5 Tail call error path, NULL target jited:1 114 PASS
test_bpf: #6 Tail call error path, index out of range jited:1 112 PASS
test_bpf: test_tail_calls: Summary: 6 PASSED, 1 FAILED, [7/7 JIT'ed]
The MAX_TAIL_CALL_CNT constant is referenced in the following JITs.
arch/arm64/net/bpf_jit_comp.c
arch/arm/net/bpf_jit_32.c
arch/mips/net/ebpf_jit.c
arch/powerpc/net/bpf_jit_comp32.c
arch/powerpc/net/bpf_jit_comp64.c
arch/riscv/net/bpf_jit_comp32.c
arch/riscv/net/bpf_jit_comp64.c
arch/s390/net/bpf_jit_comp.c
arch/sparc/net/bpf_jit_comp_64.c
arch/x86/net/bpf_jit_comp32.c
arch/x86/net/bpf_jit_comp.c
The x86 JITs all pass the test, even though the comments are wrong.
The comments can easily be fixed of course. For JITs that have the
off-by-one behaviour, an easy fix would be to change all occurrences
of MAX_TAIL_CALL_CNT to MAX_TAIL_CALL_CNT - 1. We must first know
which JITs affected though.
The fix is easy but setting up the test is hard. It took me quite some
time to get the qemu/arm setup up and running. If the same has to be
done for arm64, mips64, powerpc, powerpc64, riscv32, risc64, sparc and
s390, I will need some help with this. If someone already has a
working setup for any of the systems, the test can be performed on
that.
Or perhaps there is a better way to do this? If I implement a similar
test in selftest/bpf, that would trigger the CI when the patch is
submitted and we will see which JITs we need to fix.
> On a related topic, please don't forget to include the target kernel
> tree for your patches: [PATCH bpf] or [PATCH bpf-next].
I'll add that! All patches I sent related to this are for the bpf-next tree.
Johan
Powered by blists - more mailing lists