netdev - Re: [bpf-next RFC 0/3] Introduce eBPF flow dissector

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG4SDVVPc=uRev7kpo7bAJ2OU5AJHGHdkpJnN+VG4c_XCD4jJQ@mail.gmail.com>
Date:   Tue, 21 Aug 2018 17:19:26 -0700
From:   Petar Penkov <ppenkov@...gle.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Petar Penkov <peterpenkov96@...il.com>,
        Networking <netdev@...r.kernel.org>,
        "David S . Miller" <davem@...emloft.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        simon.horman@...ronome.com, Willem de Bruijn <willemb@...gle.com>
Subject: Re: [bpf-next RFC 0/3] Introduce eBPF flow dissector

On Mon, Aug 20, 2018 at 1:52 PM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
> On Thu, Aug 16, 2018 at 09:44:20AM -0700, Petar Penkov wrote:
>> From: Petar Penkov <ppenkov@...gle.com>
>>
>> This patch series hardens the RX stack by allowing flow dissection in BPF,
>> as previously discussed [1]. Because of the rigorous checks of the BPF
>> verifier, this provides significant security guarantees. In particular, the
>> BPF flow dissector cannot get inside of an infinite loop, as with
>> CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot
>> read outside of packet bounds, because all memory accesses are checked.
>> Also, with BPF the administrator can decide which protocols to support,
>> reducing potential attack surface. Rarely encountered protocols can be
>> excluded from dissection and the program can be updated without kernel
>> recompile or reboot if a bug is discovered.
>>
>> Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect.
>> This includes a new BPF program and attach type.
>>
>> Patch 2 adds a flow dissector program in BPF. This parses most protocols in
>> __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports,
>> and address types).
>>
>> Patch 3 adds a selftest that attaches the BPF program to the flow dissector
>> and sends traffic with different levels of encapsulation.
>
> Overall I fully support the direction. Few things to consider:
>
>> This RFC patchset exposes a few design considerations:
>>
>> 1/ Because the flow dissector key definitions live in
>> include/linux/net/flow_dissector.h, they are not visible from userspace,
>> and the flow keys definitions need to be copied in the BPF program.
>
> I don't think copy-paste avoids the issue of uapi.
> Anything used by BPF program is uapi.
> The only exception is offsets of kernel internal structures
> passed into bpf_probe_read().
> So we have several options:
> 1. be honest and say 'struct flow_dissect_key*' is now uapi
> 2. wrap all of them into 'struct bpf_flow_dissect_key*' and do rewrites
>   when/if 'struct flow_dissect_key*' changes
> 3. wait for BTF to solve it for tracing use case and for this one two.
> The idea is that kernel internal structs can be defined in bpf prog
> and since they will be described precisely in BTF that comes with the prog
> the kernel can validate that prog's BTF matches what kernel thinks it has.
> imo that's the most flexible, but BTF for all of vmlinux won't be ready
> tomorrow and looks like this patch set is ready to go, so I would go with 1 or 2.

I will pursue #2 then as both you and David are in support of it.

>
>> 2/ An alternative to adding a new hook would have been to attach flow
>> dissection programs at the XDP hook. Because this hook is executed before
>> GRO, it would have to execute on every MSS, which would be more
>> computationally expensive. Furthermore, the XDP hook is executed before an
>> SKB has been allocated and there is no clear way to move the dissected keys
>> into the SKB after it has been allocated. Eventually, perhaps a single pass
>> can implement both GRO and flow dissection -- but napi_gro_cb shows that a
>> lot more flow state would need to be parsed for this.
>
> global flow_dissect bpf hook semantics are problematic for testing.
> like patch 3 test affects the whole system including all containers.
> should the hook be per-netns or may be per netdevice?
>

Having the hook be per-netns would definitely be cleaner for testing.
I will look into
refactoring the hook from global to per-netns.

>> 3/ The BPF program cannot use direct packet access everywhere because it
>> uses an offset, initially supplied by the flow dissector.  Because the
>> initial value of this non-constant offset comes from outside of the
>> program, the verifier does not know what its value is, and it cannot verify
>> that it is within packet bounds. Therefore, direct packet access programs
>> get rejected.
>
> this part doesn't seem to match the code.
> direct packet access is allowed and usable even for fragmented skbs.
> in such case only linear part of skb is in "direct access".

I am not sure I understand. What I meant was that I use bpf_skb_load_bytes
rather than direct packet access because the offset at which I read headers,
nhoff, depends on an initial value that cannot be statically verified - namely
what __skb_flow_dissect provides. Is there an alternative approach I should
be taking here, and/or am I misunderstanding direct access?

>
> Last bit, I'm curious about, how the 'demo' flow dissector program
> from patch 2 fairs vs in-kernel dissector when performance tested?
>

I used the test in patch 3 to compare the two implementations and the
difference seemed to be small, with the BPF flow dissector being  slightly
slower.

Thank you so much for your feedback, Alexei and David!