[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACAyw9-y-Hsz1nGTqK278N9A8VQNwDhZ462c_qKE_ziG9g=OSA@mail.gmail.com>
Date: Thu, 2 Jul 2020 12:05:00 +0100
From: Lorenz Bauer <lmb@...udflare.com>
To: Jakub Sitnicki <jakub@...udflare.com>
Cc: bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
kernel-team <kernel-team@...udflare.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Andrii Nakryiko <andriin@...com>,
Marek Majkowski <marek@...udflare.com>,
Martin KaFai Lau <kafai@...com>
Subject: Re: [PATCH bpf-next v3 00/16] Run a BPF program on socket lookup
On Thu, 2 Jul 2020 at 10:24, Jakub Sitnicki <jakub@...udflare.com> wrote:
>
> Overview
> ========
>
> (Same as in v2. Please skip to next section if you've read it.)
>
> This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP,
> or BPF sk_lookup for short.
>
> BPF sk_lookup program runs when transport layer is looking up a listening
> socket for a new connection request (TCP), or when looking up an
> unconnected socket for a packet (UDP).
>
> This serves as a mechanism to overcome the limits of what bind() API allows
> to express. Two use-cases driving this work are:
>
> (1) steer packets destined to an IP range, fixed port to a single socket
>
> 192.0.2.0/24, port 80 -> NGINX socket
>
> (2) steer packets destined to an IP address, any port to a single socket
>
> 198.51.100.1, any port -> L7 proxy socket
>
> In its context, program receives information about the packet that
> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> address 4-tuple.
>
> To select a socket BPF program fetches it from a map holding socket
> references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...)
> helper to record the selection, and returns BPF_REDIRECT code. Transport
> layer then uses the selected socket as a result of socket lookup.
>
> Alternatively, program can also fail the lookup (BPF_DROP), or let the
> lookup continue as usual (BPF_OK).
>
> This lets the user match packets with listening (TCP) or receiving (UDP)
> sockets freely at the last possible point on the receive path, where we
> know that packets are destined for local delivery after undergoing
> policing, filtering, and routing.
>
> Program is attached to a network namespace, similar to BPF flow_dissector.
> We add a new attach type, BPF_SK_LOOKUP, for this.
>
> Series structure
> ================
>
> Patches are organized as so:
>
> 1: enabled multiple link-based prog attachments for bpf-netns
> 2: introduces sk_lookup program type
> 3-4: hook up the program to run on ipv4/tcp socket lookup
> 5-6: hook up the program to run on ipv6/tcp socket lookup
> 7-8: hook up the program to run on ipv4/udp socket lookup
> 9-10: hook up the program to run on ipv6/udp socket lookup
> 11-13: libbpf & bpftool support for sk_lookup
> 14-16: verifier and selftests for sk_lookup
>
> Patches are also available on GH:
>
> https://github.com/jsitnicki/linux/commits/bpf-inet-lookup-v3
>
> Performance considerations
> ==========================
>
> I'm re-running udp6 small packet flood test, the scenario for which we had
> performance concerns in [v2], to measure pps hit after the changes called
> out in change log below.
>
> Will follow up with results. But I'm posting the patches early for review
> since there is a fair amount of code changes.
>
> Further work
> ============
>
> - user docs for new prog type, Documentation/bpf/prog_sk_lookup.rst
> I'm looking for consensus on multi-prog semantics outlined in patch #4
> description before drafting the document.
>
> - timeout on accept() in tests
> I need to extract a helper for it into network_helpers in
> selftests/bpf/. Didn't want to make this series any longer.
>
> Note to maintainers
> ===================
>
> This patch series depends on bpf-netns multi-prog changes that went
> recently into 'bpf' [0]. It won't apply onto 'bpf-next' until 'bpf' gets
> merged into 'bpf-next'.
>
> Changelog
> =========
>
> v3 brings the following changes based on feedback:
>
> 1. switch to link-based program attachment,
> 2. support for multi-prog attachment,
> 3. ability to skip reuseport socket selection,
> 4. code on RX path is guarded by a static key,
> 5. struct in6_addr's are no longer copied into BPF prog context,
> 6. BPF prog context is initialized as late as possible.
>
> v2 -> v3:
> - Changes called out in patches 1-2, 4, 6, 8, 10-14, 16
> - Patches dropped:
> 01/17 flow_dissector: Extract attach/detach/query helpers
> 03/17 inet: Store layer 4 protocol in inet_hashinfo
> 08/17 udp: Store layer 4 protocol in udp_table
>
> v1 -> v2:
> - Changes called out in patches 2, 13-15, 17
> - Rebase to recent bpf-next (b4563facdcae)
>
> RFCv2 -> v1:
>
> - Switch to fetching a socket from a map and selecting a socket with
> bpf_sk_assign, instead of having a dedicated helper that does both.
> - Run reuseport logic on sockets selected by BPF sk_lookup.
> - Allow BPF sk_lookup to fail the lookup with no match.
> - Go back to having just 2 hash table lookups in UDP.
>
> RFCv1 -> RFCv2:
>
> - Make socket lookup redirection map-based. BPF program now uses a
> dedicated helper and a SOCKARRAY map to select the socket to redirect to.
> A consequence of this change is that bpf_inet_lookup context is now
> read-only.
> - Look for connected UDP sockets before allowing redirection from BPF.
> This makes connected UDP socket work as expected in the presence of
> inet_lookup prog.
> - Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
> the only other per-netns BPF prog type.
>
> [RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
> [RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
> [v1] https://lore.kernel.org/bpf/20200511185218.1422406-18-jakub@cloudflare.com/
> [v2] https://lore.kernel.org/bpf/20200506125514.1020829-1-jakub@cloudflare.com/
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=951f38cf08350884e72e0936adf147a8d764cc5d
>
> Cc: Alexei Starovoitov <ast@...nel.org>
> Cc: Andrii Nakryiko <andriin@...com>
> Cc: Lorenz Bauer <lmb@...udflare.com>
> Cc: Marek Majkowski <marek@...udflare.com>
> Cc: Martin KaFai Lau <kafai@...com>
>
> Jakub Sitnicki (16):
> bpf, netns: Handle multiple link attachments
> bpf: Introduce SK_LOOKUP program type with a dedicated attach point
> inet: Extract helper for selecting socket from reuseport group
> inet: Run SK_LOOKUP BPF program on socket lookup
> inet6: Extract helper for selecting socket from reuseport group
> inet6: Run SK_LOOKUP BPF program on socket lookup
> udp: Extract helper for selecting socket from reuseport group
> udp: Run SK_LOOKUP BPF program on socket lookup
> udp6: Extract helper for selecting socket from reuseport group
> udp6: Run SK_LOOKUP BPF program on socket lookup
> bpf: Sync linux/bpf.h to tools/
> libbpf: Add support for SK_LOOKUP program type
> tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type
> selftests/bpf: Add verifier tests for bpf_sk_lookup context access
> selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c
> selftests/bpf: Tests for BPF_SK_LOOKUP attach point
For the series:
Reviewed-by: Lorenz Bauer <lmb@...udflare.com>
>
> include/linux/bpf-netns.h | 3 +
> include/linux/bpf.h | 33 +
> include/linux/bpf_types.h | 2 +
> include/linux/filter.h | 99 ++
> include/uapi/linux/bpf.h | 74 +
> kernel/bpf/core.c | 22 +
> kernel/bpf/net_namespace.c | 125 +-
> kernel/bpf/syscall.c | 9 +
> net/core/filter.c | 188 +++
> net/ipv4/inet_hashtables.c | 60 +-
> net/ipv4/udp.c | 93 +-
> net/ipv6/inet6_hashtables.c | 66 +-
> net/ipv6/udp.c | 97 +-
> scripts/bpf_helpers_doc.py | 9 +-
> tools/bpf/bpftool/common.c | 1 +
> tools/bpf/bpftool/prog.c | 3 +-
> tools/include/uapi/linux/bpf.h | 74 +
> tools/lib/bpf/libbpf.c | 3 +
> tools/lib/bpf/libbpf.h | 2 +
> tools/lib/bpf/libbpf.map | 2 +
> tools/lib/bpf/libbpf_probes.c | 3 +
> .../bpf/prog_tests/reference_tracking.c | 2 +-
> .../selftests/bpf/prog_tests/sk_lookup.c | 1353 +++++++++++++++++
> .../selftests/bpf/progs/test_ref_track_kern.c | 181 +++
> .../selftests/bpf/progs/test_sk_lookup_kern.c | 462 ++++--
> .../selftests/bpf/verifier/ctx_sk_lookup.c | 219 +++
> 26 files changed, 2995 insertions(+), 190 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_ref_track_kern.c
> create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
>
> --
> 2.25.4
>
--
Lorenz Bauer | Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
www.cloudflare.com
Powered by blists - more mailing lists