[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG4SDVVPc=uRev7kpo7bAJ2OU5AJHGHdkpJnN+VG4c_XCD4jJQ@mail.gmail.com>
Date: Tue, 21 Aug 2018 17:19:26 -0700
From: Petar Penkov <ppenkov@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Petar Penkov <peterpenkov96@...il.com>,
Networking <netdev@...r.kernel.org>,
"David S . Miller" <davem@...emloft.net>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
simon.horman@...ronome.com, Willem de Bruijn <willemb@...gle.com>
Subject: Re: [bpf-next RFC 0/3] Introduce eBPF flow dissector
On Mon, Aug 20, 2018 at 1:52 PM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
> On Thu, Aug 16, 2018 at 09:44:20AM -0700, Petar Penkov wrote:
>> From: Petar Penkov <ppenkov@...gle.com>
>>
>> This patch series hardens the RX stack by allowing flow dissection in BPF,
>> as previously discussed [1]. Because of the rigorous checks of the BPF
>> verifier, this provides significant security guarantees. In particular, the
>> BPF flow dissector cannot get inside of an infinite loop, as with
>> CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot
>> read outside of packet bounds, because all memory accesses are checked.
>> Also, with BPF the administrator can decide which protocols to support,
>> reducing potential attack surface. Rarely encountered protocols can be
>> excluded from dissection and the program can be updated without kernel
>> recompile or reboot if a bug is discovered.
>>
>> Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect.
>> This includes a new BPF program and attach type.
>>
>> Patch 2 adds a flow dissector program in BPF. This parses most protocols in
>> __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports,
>> and address types).
>>
>> Patch 3 adds a selftest that attaches the BPF program to the flow dissector
>> and sends traffic with different levels of encapsulation.
>
> Overall I fully support the direction. Few things to consider:
>
>> This RFC patchset exposes a few design considerations:
>>
>> 1/ Because the flow dissector key definitions live in
>> include/linux/net/flow_dissector.h, they are not visible from userspace,
>> and the flow keys definitions need to be copied in the BPF program.
>
> I don't think copy-paste avoids the issue of uapi.
> Anything used by BPF program is uapi.
> The only exception is offsets of kernel internal structures
> passed into bpf_probe_read().
> So we have several options:
> 1. be honest and say 'struct flow_dissect_key*' is now uapi
> 2. wrap all of them into 'struct bpf_flow_dissect_key*' and do rewrites
> when/if 'struct flow_dissect_key*' changes
> 3. wait for BTF to solve it for tracing use case and for this one two.
> The idea is that kernel internal structs can be defined in bpf prog
> and since they will be described precisely in BTF that comes with the prog
> the kernel can validate that prog's BTF matches what kernel thinks it has.
> imo that's the most flexible, but BTF for all of vmlinux won't be ready
> tomorrow and looks like this patch set is ready to go, so I would go with 1 or 2.
I will pursue #2 then as both you and David are in support of it.
>
>> 2/ An alternative to adding a new hook would have been to attach flow
>> dissection programs at the XDP hook. Because this hook is executed before
>> GRO, it would have to execute on every MSS, which would be more
>> computationally expensive. Furthermore, the XDP hook is executed before an
>> SKB has been allocated and there is no clear way to move the dissected keys
>> into the SKB after it has been allocated. Eventually, perhaps a single pass
>> can implement both GRO and flow dissection -- but napi_gro_cb shows that a
>> lot more flow state would need to be parsed for this.
>
> global flow_dissect bpf hook semantics are problematic for testing.
> like patch 3 test affects the whole system including all containers.
> should the hook be per-netns or may be per netdevice?
>
Having the hook be per-netns would definitely be cleaner for testing.
I will look into
refactoring the hook from global to per-netns.
>> 3/ The BPF program cannot use direct packet access everywhere because it
>> uses an offset, initially supplied by the flow dissector. Because the
>> initial value of this non-constant offset comes from outside of the
>> program, the verifier does not know what its value is, and it cannot verify
>> that it is within packet bounds. Therefore, direct packet access programs
>> get rejected.
>
> this part doesn't seem to match the code.
> direct packet access is allowed and usable even for fragmented skbs.
> in such case only linear part of skb is in "direct access".
I am not sure I understand. What I meant was that I use bpf_skb_load_bytes
rather than direct packet access because the offset at which I read headers,
nhoff, depends on an initial value that cannot be statically verified - namely
what __skb_flow_dissect provides. Is there an alternative approach I should
be taking here, and/or am I misunderstanding direct access?
>
> Last bit, I'm curious about, how the 'demo' flow dissector program
> from patch 2 fairs vs in-kernel dissector when performance tested?
>
I used the test in patch 3 to compare the two implementations and the
difference seemed to be small, with the BPF flow dissector being slightly
slower.
Thank you so much for your feedback, Alexei and David!
Powered by blists - more mailing lists