[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b885663b-eaa2-f212-7b9b-2496943a30e1@oracle.com>
Date: Tue, 13 Feb 2018 13:38:53 -0700
From: Tom Hromatka <tom.hromatka@...cle.com>
To: Kees Cook <keescook@...omium.org>
Cc: Network Development <netdev@...r.kernel.org>,
Sargun Dhillon <sargun@...gun.me>,
Will Drewry <wad@...omium.org>,
Daniel Borkmann <daniel@...earbox.net>,
Linux Containers <containers@...ts.linux-foundation.org>,
Alexei Starovoitov <ast@...nel.org>,
Andy Lutomirski <luto@...capital.net>
Subject: Re: [PATCH net-next 0/3] eBPF Seccomp filters
On 02/13/2018 01:35 PM, Kees Cook wrote:
> On Tue, Feb 13, 2018 at 12:33 PM, Tom Hromatka <tom.hromatka@...cle.com> wrote:
>> On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sargun@...gun.me> wrote:
>>> This patchset enables seccomp filters to be written in eBPF. Although,
>>> this patchset doesn't introduce much of the functionality enabled by
>>> eBPF, it lays the ground work for it.
>>>
>>> It also introduces the capability to dump eBPF filters via the PTRACE
>>> API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
>>> In the attached samples, there's an example of this. One can then use
>>> BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
>>> and use that at reload time.
>>>
>>> The primary reason for not adding maps support in this patchset is
>>> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
>>> If we have a map that the BPF program can read, it can potentially
>>> "change" privileges after running. It seems like doing writes only
>>> is safe, because it can be pure, and side effect free, and therefore
>>> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
>>> to an agreement, this can be in a follow-up patchset.
>>
>>
>> Coincidentally I also sent an RFC for adding eBPF hash maps to the seccomp
>> userspace mailing list just last week:
>> https://groups.google.com/forum/#!topic/libseccomp/pX6QkVF0F74
>>
>> The kernel changes I proposed are in this email:
>> https://groups.google.com/d/msg/libseccomp/pX6QkVF0F74/ZUJlwI5qAwAJ
>>
>> In that email thread, Kees requested that I try out a binary tree in cBPF
>> and evaluate its performance. I just got a rough prototype working, and
>> while not as fast as an eBPF hash map, the cBPF binary tree was a
>> significant
>> improvement over the linear list of ifs that are currently generated. Also,
>> it only required changing a single function within the libseccomp libary
>> itself.
>>
>> https://github.com/drakenclimber/libseccomp/commit/87b36369f17385f5a7a4d95101185577fbf6203b
>>
>> Here are the results I am currently seeing using an in-house customer's
>> seccomp filter and a simplistic test program that runs getppid() thousands
>> of times.
>>
>> Test Case minimum TSC ticks to make syscall
>> ----------------------------------------------------------------
>> seccomp disabled 620
>> getppid() at the front of 306-syscall seccomp filter 722
>> getppid() in middle of 306-syscall seccomp filter 1392
>> getppid() at the end of the 306-syscall filter 2452
>> seccomp using a 306-syscall-sized EBPF hash map 800
>> cBPF filter using a binary tree 922
> I still think that's a crazy filter. :) It should be inverted to just
> check the 26 syscalls and a final "greater than" test. I would expect
> it to be faster still. :)
>
> -Kees
I completely agree it's a crazy filter, but it seems to be a
common "mistake" our users are making. It would be nice to
help them out if we can.
Tom
Powered by blists - more mailing lists