[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF2d9jimTh3LnVGwDQ-MsK7nVY=g5bVSk+=RTp0Qwz4-ZF0-jg@mail.gmail.com>
Date: Wed, 14 Mar 2018 10:22:03 -0700
From: Mahesh Bandewar (महेश बंडेवार)
<maheshb@...gle.com>
To: Alexei Starovoitov <ast@...nel.org>
Cc: David Miller <davem@...emloft.net>, daniel@...earbox.net,
linux-netdev <netdev@...r.kernel.org>, kernel-team@...com
Subject: Re: [PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect,
post-bind hooks
On Tue, Mar 13, 2018 at 8:39 PM, Alexei Starovoitov <ast@...nel.org> wrote:
> For our container management we've been using complicated and fragile setup
> consisting of LD_PRELOAD wrapper intercepting bind and connect calls from
> all containerized applications.
> The setup involves per-container IPs, policy, etc, so traditional
> network-only solutions that involve VRFs, netns, acls are not applicable.
You can keep the policies per cgroup but move the ip from cgroup to
net-ns and then none of these ebpf hacks are required since cgroup and
namespaces are orthogonal you can use cgroups in conjunction with
namespaces.
> Changing apps is not possible and LD_PRELOAD doesn't work
> for apps that don't use glibc like java and golang.
> BPF+cgroup looks to be the best solution for this problem.
> Hence we introduce 3 hooks:
> - at entry into sys_bind and sys_connect
> to let bpf prog look and modify 'struct sockaddr' provided
> by user space and fail bind/connect when appropriate
> - post sys_bind after port is allocated
>
> The approach works great and has zero overhead for anyone who doesn't
> use it and very low overhead when deployed.
>
> The main question for Daniel and Dave is what approach to take
> with prog types...
>
> In this patch set we introduce 6 new program types to make user
> experience easier:
> BPF_PROG_TYPE_CGROUP_INET4_BIND,
> BPF_PROG_TYPE_CGROUP_INET6_BIND,
> BPF_PROG_TYPE_CGROUP_INET4_CONNECT,
> BPF_PROG_TYPE_CGROUP_INET6_CONNECT,
> BPF_PROG_TYPE_CGROUP_INET4_POST_BIND,
> BPF_PROG_TYPE_CGROUP_INET6_POST_BIND,
>
> since v4 programs should not be using 'struct bpf_sock_addr'->user_ip6 fields
> and different prog type for v4 and v6 helps verifier reject such access
> at load time.
> Similarly bind vs connect are two different prog types too,
> since only sys_connect programs can call new bpf_bind() helper.
>
> This approach is very different from tcp-bpf where single
> 'struct bpf_sock_ops' and single prog type is used for different hooks.
> The field checks are done at run-time instead of load time.
>
> I think the approach taken by this patch set is justified,
> but we may do better if we extend BPF_PROG_ATTACH cmd
> with log_buf + log_size, then we should be able to combine
> bind+connect+v4+v6 into single program type.
> The idea that at load time the verifier will remember a bitmask
> of fields in bpf_sock_addr used by the program and helpers
> that program used, then at attach time we can check that
> hook is compatible with features used by the program and
> report human readable error message back via log_buf.
> We cannot do this right now with just EINVAL, since combinations
> of errors like 'using user_ip6 field but attaching to v4 hook'
> are too high to express as errno.
> This would be bigger change. If you folks think it's worth it
> we can go with this approach or if you think 6 new prog types
> is not too bad, we can leave the patch as-is.
> Comments?
> Other comments on patches are welcome.
>
> Andrey Ignatov (6):
> bpf: Hooks for sys_bind
> selftests/bpf: Selftest for sys_bind hooks
> net: Introduce __inet_bind() and __inet6_bind
> bpf: Hooks for sys_connect
> selftests/bpf: Selftest for sys_connect hooks
> bpf: Post-hooks for sys_bind
>
> include/linux/bpf-cgroup.h | 68 +++-
> include/linux/bpf_types.h | 6 +
> include/linux/filter.h | 10 +
> include/net/inet_common.h | 2 +
> include/net/ipv6.h | 2 +
> include/net/sock.h | 3 +
> include/net/udp.h | 1 +
> include/uapi/linux/bpf.h | 52 ++-
> kernel/bpf/cgroup.c | 36 ++
> kernel/bpf/syscall.c | 42 ++
> kernel/bpf/verifier.c | 6 +
> net/core/filter.c | 479 ++++++++++++++++++++++-
> net/ipv4/af_inet.c | 60 ++-
> net/ipv4/tcp_ipv4.c | 16 +
> net/ipv4/udp.c | 14 +
> net/ipv6/af_inet6.c | 47 ++-
> net/ipv6/tcp_ipv6.c | 16 +
> net/ipv6/udp.c | 20 +
> tools/include/uapi/linux/bpf.h | 39 +-
> tools/testing/selftests/bpf/Makefile | 8 +-
> tools/testing/selftests/bpf/bpf_helpers.h | 2 +
> tools/testing/selftests/bpf/connect4_prog.c | 45 +++
> tools/testing/selftests/bpf/connect6_prog.c | 61 +++
> tools/testing/selftests/bpf/test_sock_addr.c | 541 ++++++++++++++++++++++++++
> tools/testing/selftests/bpf/test_sock_addr.sh | 57 +++
> 25 files changed, 1580 insertions(+), 53 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/connect4_prog.c
> create mode 100644 tools/testing/selftests/bpf/connect6_prog.c
> create mode 100644 tools/testing/selftests/bpf/test_sock_addr.c
> create mode 100755 tools/testing/selftests/bpf/test_sock_addr.sh
>
> --
> 2.9.5
>
Powered by blists - more mailing lists