[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201203181431.t2l63nifzprxqc26@ast-mbp>
Date: Thu, 3 Dec 2020 10:14:31 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Gary Lin <glin@...e.com>, netdev@...r.kernel.org,
bpf@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
andreas.taschner@...e.com
Subject: Re: [PATCH] bpf, x64: bump the number of passes to 64
On Thu, Dec 03, 2020 at 12:20:38PM +0100, Eric Dumazet wrote:
>
>
> On 12/3/20 10:12 AM, Gary Lin wrote:
> > The x64 bpf jit expects bpf images converge within the given passes, but
> > it could fail to do so with some corner cases. For example:
> >
> > l0: ldh [4]
> > l1: jeq #0x537d, l2, l40
> > l2: ld [0]
> > l3: jeq #0xfa163e0d, l4, l40
> > l4: ldh [12]
> > l5: ldx #0xe
> > l6: jeq #0x86dd, l41, l7
> > l8: ld [x+16]
> > l9: ja 41
> >
> > [... repeated ja 41 ]
> >
> > l40: ja 41
> > l41: ret #0
> > l42: ld #len
> > l43: ret a
> >
> > This bpf program contains 32 "ja 41" instructions which are effectively
> > NOPs and designed to be replaced with valid code dynamically. Ideally,
> > bpf jit should optimize those "ja 41" instructions out when translating
> > the bpf instructions into x86_64 machine code. However, do_jit() can
> > only remove one "ja 41" for offset==0 on each pass, so it requires at
> > least 32 runs to eliminate those JMPs and exceeds the current limit of
> > passes (20). In the end, the program got rejected when BPF_JIT_ALWAYS_ON
> > is set even though it's legit as a classic socket filter.
> >
> > Since this kind of programs are usually handcrafted rather than
> > generated by LLVM, those programs tend to be small. To avoid increasing
> > the complexity of BPF JIT, this commit just bumps the number of passes
> > to 64 as suggested by Daniel to make it less likely to fail on such cases.
> >
>
> Another idea would be to stop trying to reduce size of generated
> code after a given number of passes have been attempted.
>
> Because even a limit of 64 wont ensure all 'valid' programs can be JITed.
+1.
Bumping the limit is not solving anything.
It only allows bad actors force kernel to spend more time in JIT.
If we're holding locks the longer looping may cause issues.
I think JIT is parallel enough, but still it's a concern.
I wonder how assemblers deal with it?
They probably face the same issue.
Instead of going back to 32-bit jumps and suddenly increase image size
I think we can do nop padding instead.
After few loops every insn is more or less optimal.
I think the fix could be something like:
if (is_imm8(jmp_offset)) {
EMIT2(jmp_cond, jmp_offset);
if (loop_cnt > 5) {
EMIT N nops
where N = addrs[i] - addrs[i - 1]; // not sure about this math.
N can be 0 or 4 here.
// or may be NOPs should be emitted before EMIT2.
// need to think it through
}
}
Will something like this work?
I think that's what you're suggesting, right?
Powered by blists - more mailing lists