[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57B1A081.9030209@digikod.net>
Date: Mon, 15 Aug 2016 12:59:13 +0200
From: Mickaël Salaün <mic@...ikod.net>
To: Sargun Dhillon <sargun@...gun.me>
Cc: Kees Cook <keescook@...omium.org>,
LKML <linux-kernel@...r.kernel.org>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
linux-security-module <linux-security-module@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
"Reshetova, Elena" <elena.reshetova@...el.com>
Subject: Re: [RFC 0/4] RFC: Add Checmate, BPF-driven minor LSM
On 15/08/2016 05:09, Sargun Dhillon wrote:
> On Mon, Aug 15, 2016 at 12:57:44AM +0200, Mickaël Salaün wrote:
>> Our approaches have some common points (i.e. use eBPF in an LSM, stacked
>> filters like seccomp) but I'm focused on a kind of unprivileged LSM (i.e. no
>> CAP_SYS_ADMIN), to make standalone sandboxes, which brings more constraints
>> (e.g. no use of unsafe functions like bpf_probe_read(), take care of privacy,
>> SUID exec, stable ABI…). However, I don't want to handle resource limits,
>> which should be the job of cgroups.
>>
> Kind of. Sometimes describing these resource limits is difficult. For example, I
> have a customer who is trying to restrict containers from burning up all the
> ephemeral ports on the machine. In this, they have an incredibly elaborate chain
> of wiring to prevent a given container from connecting to the same (proto,
> destip, destport) more than 1000 times.
>
> I'm unsure of how you'd model that in a cgroup.
This looks like a Netfilter rule. Have you tried applying this limitation with the connlimit module?
>
>> For now, I'm focusing on file-system access control which is one of the more
>> complex system to properly filter. I also plan to support basic network access
>> control.
>>
>> What you are trying to accomplish seems more related to a Netfilter extension
>> (something like ipset but with eBPF maybe?).
>>
> I don't only want to do network access control, I also want to write to the
> value once it's copied into kernel space. There are lot of benefits of doing
> this at the syscall level, but the two primary ones are performance, and
> capability.
>
> One of the biggest complaints with our current approach to filtering & load
> balancing (iptables) is that it hides information. When people connect through
> the load balancer, they want to find out who they connected to, and without some
> high application-level mechanism, this isn't possible. On the other hand, if we
> just rewrite the destination address in the connect hook, we can pretty easily
> allow them to do getpeername.
What exactly is not doable with Netfilter (e.g. REDIRECT or TPROXY)?
>
> I'm curious about your filesystem access limiter. Do you have a way to make it so
> that a given container can only write, say, 100mb of data to disk?
It's a filesystem access control. It doesn't deal with quota and is not focused on container but process hierarchies (which is more generic).
What is not doable with a quota mount option? It may be more appropriate to enhance the VFS (or overlayfs) to apply this kind of limitation, if needed.
Download attachment "signature.asc" of type "application/pgp-signature" (456 bytes)
Powered by blists - more mailing lists