netdev - Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a9314ae9-eab8-5761-a547-5bfb5022dac3@fb.com>
Date:   Tue, 19 Oct 2021 09:02:33 -0700
From:   Yonghong Song <yhs@...com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Toke Høiland-Jørgensen <toke@...hat.com>
CC:     Martin KaFai Lau <kafai@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Joanne Koong <joannekoong@...com>, <bpf@...r.kernel.org>,
        <netdev@...r.kernel.org>, <Kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt



On 10/18/21 5:00 PM, Alexei Starovoitov wrote:
> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>>
>> So if we can't fix the verifier, maybe we could come up with a more
>> general helper for packet parsing? Something like:
>>
>> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
>> {
>>    ptr = ctx->data + offset;
>>    while (ptr < ctx->data_end) {
>>      offset = callback_fn(ptr, ctx->data_end, callback_arg);
>>      if (offset == 0)
>>        return 0;
>>      ptr += offset;
>>    }
>>    
>>    // out of bounds before callback was done
>>    return -EINVAL;
>> }
> 
> We're starting to work on this since it will be useful not only for
> packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration.
> 
>> This would work for parsing any kind of packet header or TLV-style data
>> without having to teach the kernel about each header type. It'll have
>> quite a bit of overhead if all the callbacks happen via indirect calls,
>> but maybe the verifier can inline the calls (or at least turn them into
>> direct CALL instructions)?
> 
> Right. That's the main downside.
> If the bpf_for_each*() helper is simple enough the verifier can inline it
> similar to map_gen_lookup. In such case the indirect call will be a direct call,
> so the overhead won't be that bad, but it's still a function call and
> static function will have full prologue+epilogue.
> Converting static function into direct jump would be really challenging
> for the verifier and won't provide much benefit, since r6-r9 save/restore
> would need to happen anyway even for such 'inlined' static func, since
> llvm will be freely using r6-r9 for insns inside function body
> assuming that it's a normal function.
> 
> May be there is a way to avoid call overhead with with clang extensions.
> If we want to do:
> int mem_eq(char *p1, char *p2, int size)
> {
>    int i;
>    for (i = 0; i < size; i++)
>      if (p1[i] != p2[i])
>        return 0;
>    return 1;
> }
> 
> With clang extension we might write it as:
> int bpf_mem_eq(char *p1, char *p2, int size)
> {
>    int i = 0;
>    int iter;
> 
>    iter = __builtin_for_until(i, size, ({
>        if (p1[i] != p2[i])
>          goto out;
>    }));
>    out:
>    if (iter != size)
>      return 0;
>    return 1;
> }
> 
> The llvm will generate pseudo insns for this __builtin_for.
> The verifier will analyze the loop body for the range [0, size)
> and replace pseudo insns with normal branches after the verification.
> We might even keep the normal C syntax for loops and use
> llvm HardwareLoops pass to add pseudo insns.
> It's more or less the same ideas for loops we discussed before
> bounded loops were introduced.
> The main problem with bounded loops is that the loop body will
> typically be verified the number of times equal to the number of iterations.
> So for any non-trivial loop such iteration count is not much more
> than 100. The verifier can do scalar evolution analysis, but
> it's likely won't work for many cases and user experience
> will suffer. With __builtin_for the scalar evolution is not necessary,
> since induction variable is one and explicit and its range is explicit too.
> That enables single pass over loop body.
> One might argue that for (i = 0; i < 10000; i += 10) loops are
> necessary too, but instead of complicating the verifier with sparse
> ranges it's better to put that on users that can do:
>    iter = __builtin_for_until(i, 10000 / 10, ({
>        j = i * 10;
>        use j;
>    }));
> Single explicit induction variable makes the verification practical.
> The loop body won't be as heavily optimized as normal loop,
> but it's a good thing.

We have discussed how to verify *well-formed* loops back in 2018.
(BPF control flow, supporting loops and other patterns:
  https://www.linuxplumbersconf.org/event/2/contributions/116/)
Now probably the time to revisit it again!

I think Alexei's proposal is the right direction to have
compiler preserving the loop structure with some pseudo instructions
and verifier is able to range-based verification
instead of iterating all loop iterations.

LLVM already has some IR loop intrinsics like below:
   def int_set_loop_iterations :
   def int_start_loop_iterations :
   def int_test_set_loop_iterations :
   def int_test_start_loop_iterations :
   def int_loop_decrement :
   def int_loop_decrement_reg :
to facilitate hardware loop. BPF target can define its own
intrinsics to help define a well structured loop.

I will look into this.