[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9314ae9-eab8-5761-a547-5bfb5022dac3@fb.com>
Date: Tue, 19 Oct 2021 09:02:33 -0700
From: Yonghong Song <yhs@...com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Toke Høiland-Jørgensen <toke@...hat.com>
CC: Martin KaFai Lau <kafai@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Joanne Koong <joannekoong@...com>, <bpf@...r.kernel.org>,
<netdev@...r.kernel.org>, <Kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
On 10/18/21 5:00 PM, Alexei Starovoitov wrote:
> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>>
>> So if we can't fix the verifier, maybe we could come up with a more
>> general helper for packet parsing? Something like:
>>
>> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
>> {
>> ptr = ctx->data + offset;
>> while (ptr < ctx->data_end) {
>> offset = callback_fn(ptr, ctx->data_end, callback_arg);
>> if (offset == 0)
>> return 0;
>> ptr += offset;
>> }
>>
>> // out of bounds before callback was done
>> return -EINVAL;
>> }
>
> We're starting to work on this since it will be useful not only for
> packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration.
>
>> This would work for parsing any kind of packet header or TLV-style data
>> without having to teach the kernel about each header type. It'll have
>> quite a bit of overhead if all the callbacks happen via indirect calls,
>> but maybe the verifier can inline the calls (or at least turn them into
>> direct CALL instructions)?
>
> Right. That's the main downside.
> If the bpf_for_each*() helper is simple enough the verifier can inline it
> similar to map_gen_lookup. In such case the indirect call will be a direct call,
> so the overhead won't be that bad, but it's still a function call and
> static function will have full prologue+epilogue.
> Converting static function into direct jump would be really challenging
> for the verifier and won't provide much benefit, since r6-r9 save/restore
> would need to happen anyway even for such 'inlined' static func, since
> llvm will be freely using r6-r9 for insns inside function body
> assuming that it's a normal function.
>
> May be there is a way to avoid call overhead with with clang extensions.
> If we want to do:
> int mem_eq(char *p1, char *p2, int size)
> {
> int i;
> for (i = 0; i < size; i++)
> if (p1[i] != p2[i])
> return 0;
> return 1;
> }
>
> With clang extension we might write it as:
> int bpf_mem_eq(char *p1, char *p2, int size)
> {
> int i = 0;
> int iter;
>
> iter = __builtin_for_until(i, size, ({
> if (p1[i] != p2[i])
> goto out;
> }));
> out:
> if (iter != size)
> return 0;
> return 1;
> }
>
> The llvm will generate pseudo insns for this __builtin_for.
> The verifier will analyze the loop body for the range [0, size)
> and replace pseudo insns with normal branches after the verification.
> We might even keep the normal C syntax for loops and use
> llvm HardwareLoops pass to add pseudo insns.
> It's more or less the same ideas for loops we discussed before
> bounded loops were introduced.
> The main problem with bounded loops is that the loop body will
> typically be verified the number of times equal to the number of iterations.
> So for any non-trivial loop such iteration count is not much more
> than 100. The verifier can do scalar evolution analysis, but
> it's likely won't work for many cases and user experience
> will suffer. With __builtin_for the scalar evolution is not necessary,
> since induction variable is one and explicit and its range is explicit too.
> That enables single pass over loop body.
> One might argue that for (i = 0; i < 10000; i += 10) loops are
> necessary too, but instead of complicating the verifier with sparse
> ranges it's better to put that on users that can do:
> iter = __builtin_for_until(i, 10000 / 10, ({
> j = i * 10;
> use j;
> }));
> Single explicit induction variable makes the verification practical.
> The loop body won't be as heavily optimized as normal loop,
> but it's a good thing.
We have discussed how to verify *well-formed* loops back in 2018.
(BPF control flow, supporting loops and other patterns:
https://www.linuxplumbersconf.org/event/2/contributions/116/)
Now probably the time to revisit it again!
I think Alexei's proposal is the right direction to have
compiler preserving the loop structure with some pseudo instructions
and verifier is able to range-based verification
instead of iterating all loop iterations.
LLVM already has some IR loop intrinsics like below:
def int_set_loop_iterations :
def int_start_loop_iterations :
def int_test_set_loop_iterations :
def int_test_start_loop_iterations :
def int_loop_decrement :
def int_loop_decrement_reg :
to facilitate hardware loop. BPF target can define its own
intrinsics to help define a well structured loop.
I will look into this.
Powered by blists - more mailing lists