lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEk6tEw3ty0kBH+06TYt4=Ywt-4_cHBa9f8p3ajMghtjRkHmMg@mail.gmail.com>
Date:   Tue, 13 Feb 2018 12:02:03 -0500
From:   Jessie Frazelle <me@...sfraz.com>
To:     Sargun Dhillon <sargun@...gun.me>
Cc:     Kees Cook <keescook@...omium.org>,
        Network Development <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        Andy Lutomirski <luto@...capital.net>,
        Will Drewry <wad@...omium.org>
Subject: Re: [PATCH net-next 0/3] eBPF Seccomp filters

On Tue, Feb 13, 2018 at 11:29 AM, Sargun Dhillon <sargun@...gun.me> wrote:
> On Tue, Feb 13, 2018 at 7:47 AM, Kees Cook <keescook@...omium.org> wrote:
>> On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sargun@...gun.me> wrote:
>>> This patchset enables seccomp filters to be written in eBPF. Although,
>>> this patchset doesn't introduce much of the functionality enabled by
>>> eBPF, it lays the ground work for it.
>>>
>>> It also introduces the capability to dump eBPF filters via the PTRACE
>>> API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
>>> In the attached samples, there's an example of this. One can then use
>>> BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
>>> and use that at reload time.
>>>
>>> The primary reason for not adding maps support in this patchset is
>>> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
>>> If we have a map that the BPF program can read, it can potentially
>>> "change" privileges after running. It seems like doing writes only
>>> is safe, because it can be pure, and side effect free, and therefore
>>> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
>>> to an agreement, this can be in a follow-up patchset.
>>
>> What's the reason for adding eBPF support? seccomp shouldn't need it,
>> and it only makes the code more complex. I'd rather stick with  -- cBPF
>> until we have an overwhelmingly good reason to use eBPF as a "native"
>> seccomp filter language.
>>
>> -Kees
>>
> Three reasons:
> 1) The userspace tooling for eBPF is much better than the user space
> tooling for cBPF. Our use case is specifically to optimize Docker
> policies. This is roughly what their seccomp policy looks like:
> https://github.com/moby/moby/blob/master/profiles/seccomp/default.json.
> It would be much nicer to be able to leverage eBPF to write this in C,
> or any other the other languages targetting eBPF. In addition, if we
> have write-only maps, we can exfiltrate information from seccomp, like
> arguments, and errors in a relatively cheap way compared to cBPF, and
> then extract this via the bcc stack. Writing cBPF via C macros is a
> pain, and the off the shelf cBPF libraries are getting no love. The
> eBPF community is *exploding* with contributions.

Is stage two of this getting runc to support eBPF and docker to change
the default to be written as eBPF, because I foresee that being a
problem mainly with the kernel versions people use. The point of that
patch was to help the most people and as your point in (2) is made
about performance, that is a trade-off I would be willing to make in
order to have this functionality on more kernel versions.

The other alternative would be to have docker translate to use eBPF if
the kernel supported it, but that amount of complexity seems a bit
unnecessary for a feature that was trying to also be "simple".

Or do you plan on wrapping filters onto processes tangentially from
the runtime, in which case, that should be totally fine :)

Anyways this is kinda a tangent from the main point of getting it in
the kernel, just I would hate to see someone having to maintain this
without there being a path to getting it upstream elsewhere.

>
> 2) In my testing, which thus so far has been very rudimentary, with
> rewriting the policy that libseccomp generates from the Docker policy
> to use eBPF, and eBPF maps performs much better than cBPF. The
> specific case tested was to use a bpf array to lookup rules for a
> particular syscall. In a super trivial test, this was about 5% low
> latency than using traditional branches. If you need more evidence of
> this, I can work a little bit more on the maps related patches, and
> see if I can get some more benchmarking. From my understanding, we
> would need to add "sealing" support for maps, in which they can be
> marked as read-only, and only at that point should an eBPF seccomp
> program be able to read from them.
>
> 3) Eventually, I'd like to use some more advanced capabilities of
> eBPF, like being able to rewrite arguments safely (not things referred
> to by pointers, but just plain old arguments).
>
>>>
>>>
>>> Sargun Dhillon (3):
>>>   bpf, seccomp: Add eBPF filter capabilities
>>>   seccomp, ptrace: Add a mechanism to retrieve attached eBPF seccomp
>>>     filters
>>>   bpf: Add eBPF seccomp sample programs
>>>
>>>  arch/Kconfig                 |   7 ++
>>>  include/linux/bpf_types.h    |   3 +
>>>  include/linux/seccomp.h      |  12 +++
>>>  include/uapi/linux/bpf.h     |   2 +
>>>  include/uapi/linux/ptrace.h  |   5 +-
>>>  include/uapi/linux/seccomp.h |  15 ++--
>>>  kernel/bpf/syscall.c         |   1 +
>>>  kernel/ptrace.c              |   3 +
>>>  kernel/seccomp.c             | 185 ++++++++++++++++++++++++++++++++++++++-----
>>>  samples/bpf/Makefile         |   9 +++
>>>  samples/bpf/bpf_load.c       |   9 ++-
>>>  samples/bpf/seccomp1_kern.c  |  17 ++++
>>>  samples/bpf/seccomp1_user.c  |  34 ++++++++
>>>  samples/bpf/seccomp2_kern.c  |  24 ++++++
>>>  samples/bpf/seccomp2_user.c  |  66 +++++++++++++++
>>>  15 files changed, 362 insertions(+), 30 deletions(-)
>>>  create mode 100644 samples/bpf/seccomp1_kern.c
>>>  create mode 100644 samples/bpf/seccomp1_user.c
>>>  create mode 100644 samples/bpf/seccomp2_kern.c
>>>  create mode 100644 samples/bpf/seccomp2_user.c
>>>
>>> --
>>> 2.14.1
>>>
>>
>>
>>
>> --
>> Kees Cook
>> Pixel Security



-- 


Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC  511E 18F3 685C 0022 BFF3
pgp.mit.edu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ