netdev - Re: [net-next v3 0/2] eBPF seccomp filters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ba60fa0f-e0d7-d996-6836-846adf16cc10@iogearbox.net>
Date:   Tue, 27 Feb 2018 10:28:30 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Sargun Dhillon <sargun@...gun.me>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     netdev <netdev@...r.kernel.org>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Andy Lutomirski <luto@...capital.net>,
        Will Drewry <wad@...omium.org>,
        Jessie Frazelle <me@...sfraz.com>,
        Brian Goff <cpuguy83@...il.com>, tom.hromatka@...cle.com,
        James Morris <jmorris@...ei.org>
Subject: Re: [net-next v3 0/2] eBPF seccomp filters

On 02/27/2018 01:01 AM, Sargun Dhillon wrote:
> On Mon, Feb 26, 2018 at 3:04 PM, Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
>> On Mon, Feb 26, 2018 at 07:26:54AM +0000, Sargun Dhillon wrote:
>>> This patchset enables seccomp filters to be written in eBPF. Although, this
>>> patchset doesn't introduce much of the functionality enabled by eBPF, it lays
>>> the ground work for it. Currently, you have to disable CHECKPOINT_RESTORE
>>> support in order to utilize eBPF seccomp filters, as eBPF filters cannot be
>>> retrieved via the ptrace GET_FILTER API.
>>
>> this was discussed multiple times in the past.
>> In eBPF land it's practically impossible to do checkpoint/restore
>> of the whole bpf program/map graph.
>>
>>> Any user can load a bpf seccomp filter program, and it can be pinned and
>>> reused without requiring access to the bpf syscalls. A user only requires
>>> the traditional permissions of either being cap_sys_admin, or have
>>> no_new_privs set in order to install their rule.
>>>
>>> The primary reason for not adding maps support in this patchset is
>>> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
>>> If we have a map that the BPF program can read, it can potentially
>>> "change" privileges after running. It seems like doing writes only
>>> is safe, because it can be pure, and side effect free, and therefore
>>> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
>>> to an agreement, this can be in a follow-up patchset.
>>
>> readonly maps already exist. See BPF_F_RDONLY.
>> Is that not enough?
>>
> With BPF_F_RDONLY, is there a mechanism to populate a prog_array, and
> then mark it rd_only?

This would still need to be extended for this purpose. Right now this is
either set on map creation (e.g. such that only prog itself can update the
entries) or obj_get. So you'd need a mechanism that sets flags into rdonly
mode where once set it cannot be undone anymore for the remaining lifetime
of the map.

>>> A benchmark of this patchset is as follows for a very standard eBPF filter:
>>>
>>> Given this test program:
>>> for (i = 10; i < 99999999; i++) syscall(__NR_getpid);
>>>
>>> If I implement an eBPF filter with PROG_ARRAYs with a program per syscall,
>>> and tail call, the numbers are such:
>>> ebpf JIT 12.3% slower than native
>>> ebpf no JIT 13.6% slower than native
>>> seccomp JIT 17.6% slower than native
>>> seccomp no JIT 37% slower than native
>>
>> the perf gains are misleading, since patches don't enable bpf_tail_call.
>>
>> The main statement I want to hear from seccomp maintainers before
>> proceeding any further on this that enabling eBPF in seccomp won't lead
>> to seccomp folks arguing against changes in bpf core (like verifier)
>> just because it's used by seccomp.
>> It must be spelled out in the commit log with explicit Ack.

Fully agree.