[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+DcSEgPWDBW1nnweb8pCeOjeLGj7LRgTei+YO-+JKY2VZDz1w@mail.gmail.com>
Date: Tue, 2 Apr 2019 13:54:21 -0700
From: Petar Penkov <peterpenkov96@...il.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, David Miller <davem@...emloft.net>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Simon Horman <simon.horman@...ronome.com>,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH bpf 5/5] flow_dissector: document BPF flow dissector environment
On Mon, Apr 1, 2019 at 1:57 PM Stanislav Fomichev <sdf@...gle.com> wrote:
>
> Short doc on what BPF flow dissector should expect in the input
> __sk_buff and flow_keys.
>
> Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> ---
> .../networking/bpf_flow_dissector.txt | 115 ++++++++++++++++++
> 1 file changed, 115 insertions(+)
> create mode 100644 Documentation/networking/bpf_flow_dissector.txt
>
> diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt
> new file mode 100644
> index 000000000000..513be8e20afb
> --- /dev/null
> +++ b/Documentation/networking/bpf_flow_dissector.txt
> @@ -0,0 +1,115 @@
> +==================
> +BPF Flow Dissector
> +==================
> +
> +Overview
> +========
> +
> +Flow dissector is a routine that parses metadata out of the packets. It's
> +used in the various places in the networking subsystem (RFS, flow hash, etc).
> +
> +BPF flow dissector is an attempt to reimplement C-based flow dissector logic
> +in BPF to gain all the benefits of BPF verifier (namely, limits on the
> +number of instructions and tail calls).
> +
> +API
> +===
> +
> +BPF flow dissector programs operate on an __sk_buff. However, only the
> +limited set of fields is allowed: data, data_end and flow_keys. flow_keys
> +is 'struct bpf_flow_keys' and contains flow dissector input and
> +output arguments.
> +
> +The inputs are:
> + * nhoff - initial offset of the networking header
> + * thoff - initial offset of the transport header, initialized to nhoff
> + * n_proto - L3 protocol type, parsed out of L2 header
> +
> +Flow dissector BPF program should fill out the rest of the 'struct
> +bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also
> +adjusted accordingly.
> +
> +The return code of the BPF program is either BPF_OK to indicate successful
> +dissection, or BPF_DROP to indicate parsing error.
I don't think this is actually enforced. I believe the current code
just checks if the status is BPF_OK or not, rather than BPF_OK,
BPF_DROP, or neither.
> +
> +__sk_buff->data
> +===============
> +
> +In the VLAN-less case, this is what the initial state of the BPF flow
> +dissector looks like:
> ++------+------+------------+-----------+
> +| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
> ++------+------+------------+-----------+
> + ^
> + |
> + +-- flow dissector starts here
> +
> +skb->data + flow_keys->nhoff point to the first byte of L3_HEADER.
> +flow_keys->thoff = nhoff
> +flow_keys->n_proto = ETHER_TYPE
> +
> +
> +In case of VLAN, flow dissector can be called with the two different states.
> +
> +Pre-VLAN parsing:
> ++------+------+------+-----+-----------+-----------+
> +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> ++------+------+------+-----+-----------+-----------+
> + ^
> + |
> + +-- flow dissector starts here
> +
> +skb->data + flow_keys->nhoff point the to first byte of TCI.
> +flow_keys->thoff = nhoff
> +flow_keys->n_proto = TPID
> +
> +Please note that TPID can be 802.1AD and, hence, BPF program would
> +have to parse VLAN information twice for double tagged packets.
> +
> +
> +Post-VLAN parsing:
> ++------+------+------+-----+-----------+-----------+
> +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> ++------+------+------+-----+-----------+-----------+
> + ^
> + |
> + +-- flow dissector starts here
> +
> +skb->data + flow_keys->nhoff point the to first byte of L3_HEADER.
> +flow_keys->thoff = nhoff
> +flow_keys->n_proto = ETHER_TYPE
> +
> +In this case VLAN information has been processed before the flow dissector
> +and BPF flow dissector is not required to handle it.
> +
> +
> +The takeaway here is as follows: BPF flow dissector program can be called with
> +the optional VLAN header and should gracefully handle both cases: when single
> +or double VLAN is present and when it is not present. The same program
> +can be called for both cases and would have to be written carefully to
> +handle both cases.
> +
> +
> +Reference Implementation
> +========================
> +
> +See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference
> +implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for
> +the loader. bpftool can be used to load BPF flow dissector program as well.
> +
> +The reference implementation is organized as follows:
> +* jmp_table map that contains sub-programs for each supported L3 protocol
> +* _dissect routine - entry point; it does input n_proto parsing and does
> + bpf_tail_call to the appropriate L3 handler
> +
> +Since BPF at this point doesn't support looping (or any jumping back),
> +jmp_table is used instead to handle multiple levels of encapsulation (and
> +IPv6 options).
> +
> +
> +Current Limitations
> +===================
> +BPF flow dissector doesn't support exporting all the metadata that in-kernel
> +C-based implementation can export. Notable example is single VLAN (802.1Q)
> +and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys'
> +for a set of information that's currently can be exported from the BPF context.
> --
> 2.21.0.392.gf8f6787159e-goog
>
Powered by blists - more mailing lists