[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <77f77631-f8ad-dc0c-94ce-ec561d4c10f9@gmail.com>
Date: Tue, 13 Mar 2018 23:21:08 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Alexei Starovoitov <ast@...nel.org>, davem@...emloft.net
Cc: daniel@...earbox.net, netdev@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH RFC bpf-next 1/6] bpf: Hooks for sys_bind
On 03/13/2018 08:39 PM, Alexei Starovoitov wrote:
> From: Andrey Ignatov <rdna@...com>
>
> == The problem ==
>
> There is a use-case when all processes inside a cgroup should use one
> single IP address on a host that has multiple IP configured. Those
> processes should use the IP for both ingress and egress, for TCP and UDP
> traffic. So TCP/UDP servers should be bound to that IP to accept
> incoming connections on it, and TCP/UDP clients should make outgoing
> connections from that IP. It should not require changing application
> code since it's often not possible.
>
> Currently it's solved by intercepting glibc wrappers around syscalls
> such as `bind(2)` and `connect(2)`. It's done by a shared library that
> is preloaded for every process in a cgroup so that whenever TCP/UDP
> server calls `bind(2)`, the library replaces IP in sockaddr before
> passing arguments to syscall. When application calls `connect(2)` the
> library transparently binds the local end of connection to that IP
> (`bind(2)` with `IP_BIND_ADDRESS_NO_PORT` to avoid performance penalty).
>
> Shared library approach is fragile though, e.g.:
> * some applications clear env vars (incl. `LD_PRELOAD`);
> * `/etc/ld.so.preload` doesn't help since some applications are linked
> with option `-z nodefaultlib`;
> * other applications don't use glibc and there is nothing to intercept.
>
> == The solution ==
>
> The patch provides much more reliable in-kernel solution for the 1st
> part of the problem: binding TCP/UDP servers on desired IP. It does not
> depend on application environment and implementation details (whether
> glibc is used or not).
>
If I understand well, strace(1) will not show the real (after
modification by eBPF) IP/port ?
What about selinux and other LSM ?
We have now network namespaces for full isolation. Soon ILA will come.
The argument that it is not convenient (or even possible) to change the
application or using modern isolation is quite strange, considering the
added burden/complexity/bloat to the kernel.
The post hook for sys_bind is clearly a failure of the model, since
releasing the port might already be too late, another thread might fail
to get it during a non zero time window.
It seems this is exactly the case where a netns would be the correct answer.
If you want to provide an alternate port allocation strategy, better
provide a correct eBPF hook.
Powered by blists - more mailing lists