netdev - Re: [PATCH net-next 0/3] eBPF Seccomp filters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <b885663b-eaa2-f212-7b9b-2496943a30e1@oracle.com>
Date:   Tue, 13 Feb 2018 13:38:53 -0700
From:   Tom Hromatka <tom.hromatka@...cle.com>
To:     Kees Cook <keescook@...omium.org>
Cc:     Network Development <netdev@...r.kernel.org>,
        Sargun Dhillon <sargun@...gun.me>,
        Will Drewry <wad@...omium.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Andy Lutomirski <luto@...capital.net>
Subject: Re: [PATCH net-next 0/3] eBPF Seccomp filters



On 02/13/2018 01:35 PM, Kees Cook wrote:
> On Tue, Feb 13, 2018 at 12:33 PM, Tom Hromatka <tom.hromatka@...cle.com> wrote:
>> On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sargun@...gun.me> wrote:
>>> This patchset enables seccomp filters to be written in eBPF. Although,
>>> this patchset doesn't introduce much of the functionality enabled by
>>> eBPF, it lays the ground work for it.
>>>
>>> It also introduces the capability to dump eBPF filters via the PTRACE
>>> API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
>>> In the attached samples, there's an example of this. One can then use
>>> BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
>>> and use that at reload time.
>>>
>>> The primary reason for not adding maps support in this patchset is
>>> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
>>> If we have a map that the BPF program can read, it can potentially
>>> "change" privileges after running. It seems like doing writes only
>>> is safe, because it can be pure, and side effect free, and therefore
>>> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
>>> to an agreement, this can be in a follow-up patchset.
>>
>>
>> Coincidentally I also sent an RFC for adding eBPF hash maps to the seccomp
>> userspace mailing list just last week:
>> https://groups.google.com/forum/#!topic/libseccomp/pX6QkVF0F74
>>
>> The kernel changes I proposed are in this email:
>> https://groups.google.com/d/msg/libseccomp/pX6QkVF0F74/ZUJlwI5qAwAJ
>>
>> In that email thread, Kees requested that I try out a binary tree in cBPF
>> and evaluate its performance.  I just got a rough prototype working, and
>> while not as fast as an eBPF hash map, the cBPF binary tree was a
>> significant
>> improvement over the linear list of ifs that are currently generated.  Also,
>> it only required changing a single function within the libseccomp libary
>> itself.
>>
>> https://github.com/drakenclimber/libseccomp/commit/87b36369f17385f5a7a4d95101185577fbf6203b
>>
>> Here are the results I am currently seeing using an in-house customer's
>> seccomp filter and a simplistic test program that runs getppid() thousands
>> of times.
>>
>> Test Case                      minimum TSC ticks to make syscall
>> ----------------------------------------------------------------
>> seccomp disabled                                             620
>> getppid() at the front of 306-syscall seccomp filter         722
>> getppid() in middle of 306-syscall seccomp filter           1392
>> getppid() at the end of the 306-syscall filter              2452
>> seccomp using a 306-syscall-sized EBPF hash map              800
>> cBPF filter using a binary tree                              922
> I still think that's a crazy filter. :) It should be inverted to just
> check the 26 syscalls and a final "greater than" test. I would expect
> it to be faster still. :)
>
> -Kees

I completely agree it's a crazy filter, but it seems to be a
common "mistake" our users are making.  It would be nice to
help them out if we can.

Tom