netdev - Re: [bpf-next RFC 0/3] Introduce eBPF flow dissector

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180820205205.jj7e75pulwqkorpr@ast-mbp>
Date:   Mon, 20 Aug 2018 13:52:07 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Petar Penkov <peterpenkov96@...il.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, ast@...nel.org,
        daniel@...earbox.net, simon.horman@...ronome.com,
        Petar Penkov <ppenkov@...gle.com>, willemb@...gle.com
Subject: Re: [bpf-next RFC 0/3] Introduce eBPF flow dissector

On Thu, Aug 16, 2018 at 09:44:20AM -0700, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@...gle.com>
> 
> This patch series hardens the RX stack by allowing flow dissection in BPF,
> as previously discussed [1]. Because of the rigorous checks of the BPF
> verifier, this provides significant security guarantees. In particular, the
> BPF flow dissector cannot get inside of an infinite loop, as with
> CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot
> read outside of packet bounds, because all memory accesses are checked.
> Also, with BPF the administrator can decide which protocols to support,
> reducing potential attack surface. Rarely encountered protocols can be
> excluded from dissection and the program can be updated without kernel
> recompile or reboot if a bug is discovered.
> 
> Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect.
> This includes a new BPF program and attach type.
> 
> Patch 2 adds a flow dissector program in BPF. This parses most protocols in
> __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports,
> and address types).
> 
> Patch 3 adds a selftest that attaches the BPF program to the flow dissector
> and sends traffic with different levels of encapsulation.

Overall I fully support the direction. Few things to consider:

> This RFC patchset exposes a few design considerations:
> 
> 1/ Because the flow dissector key definitions live in
> include/linux/net/flow_dissector.h, they are not visible from userspace,
> and the flow keys definitions need to be copied in the BPF program.

I don't think copy-paste avoids the issue of uapi.
Anything used by BPF program is uapi.
The only exception is offsets of kernel internal structures
passed into bpf_probe_read().
So we have several options:
1. be honest and say 'struct flow_dissect_key*' is now uapi
2. wrap all of them into 'struct bpf_flow_dissect_key*' and do rewrites
  when/if 'struct flow_dissect_key*' changes
3. wait for BTF to solve it for tracing use case and for this one two.
The idea is that kernel internal structs can be defined in bpf prog
and since they will be described precisely in BTF that comes with the prog
the kernel can validate that prog's BTF matches what kernel thinks it has.
imo that's the most flexible, but BTF for all of vmlinux won't be ready
tomorrow and looks like this patch set is ready to go, so I would go with 1 or 2.

> 2/ An alternative to adding a new hook would have been to attach flow
> dissection programs at the XDP hook. Because this hook is executed before
> GRO, it would have to execute on every MSS, which would be more
> computationally expensive. Furthermore, the XDP hook is executed before an
> SKB has been allocated and there is no clear way to move the dissected keys
> into the SKB after it has been allocated. Eventually, perhaps a single pass
> can implement both GRO and flow dissection -- but napi_gro_cb shows that a
> lot more flow state would need to be parsed for this.

global flow_dissect bpf hook semantics are problematic for testing.
like patch 3 test affects the whole system including all containers.
should the hook be per-netns or may be per netdevice?

> 3/ The BPF program cannot use direct packet access everywhere because it
> uses an offset, initially supplied by the flow dissector.  Because the
> initial value of this non-constant offset comes from outside of the
> program, the verifier does not know what its value is, and it cannot verify
> that it is within packet bounds. Therefore, direct packet access programs
> get rejected.

this part doesn't seem to match the code.
direct packet access is allowed and usable even for fragmented skbs.
in such case only linear part of skb is in "direct access".

Last bit, I'm curious about, how the 'demo' flow dissector program
from patch 2 fairs vs in-kernel dissector when performance tested?