netdev - Re: [net-next v3 0/2] eBPF seccomp filters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMp4zn9g06jTAAycw6hNXF+KsfOM2SXvr1aYywnXyXkEiSO0rA@mail.gmail.com>
Date:   Thu, 1 Mar 2018 13:51:19 -0800
From:   Sargun Dhillon <sargun@...gun.me>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        chris hyser <chris.hyser@...cle.com>,
        Kees Cook <keescook@...omium.org>,
        Will Drewry <wad@...omium.org>,
        Netdev <netdev@...r.kernel.org>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>
Subject: Re: [net-next v3 0/2] eBPF seccomp filters

On Thu, Mar 1, 2018 at 9:44 AM, Andy Lutomirski <luto@...capital.net> wrote:
> On Wed, Feb 28, 2018 at 7:56 PM, Daniel Borkmann <daniel@...earbox.net> wrote:
>> On 02/28/2018 12:55 AM, chris hyser wrote:
>>>> On 02/27/2018 04:58 PM, Daniel Borkmann wrote: >> On 02/27/2018 05:59 PM, chris hyser wrote:
>>>>>> On 02/27/2018 11:00 AM, Kees Cook wrote:
>>>>>>> On Tue, Feb 27, 2018 at 6:53 AM, chris hyser <chris.hyser@...cle.com> wrote:
>>>>>>>> On 02/26/2018 11:38 PM, Kees Cook wrote:
>>>>>>>>> On Mon, Feb 26, 2018 at 8:19 PM, Andy Lutomirski <luto@...capital.net>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> 3. Straight-up bugs.  Those are exactly as problematic as verifier
>>>>>>>>>> bugs in any other unprivileged eBPF program type, right?  I don't see
>>>>>>>>>> why seccomp is special here.
>>>>>>>>>
>>>>>>>>> My concern is more about unintended design mistakes or other feature
>>>>>>>>> creep with side-effects, especially when it comes to privileges and
>>>>>>>>> synchronization. Getting no-new-privs done correctly, for example,
>>>>>>>>> took some careful thought and discussion, and I'm shy from how painful
>>>>>>>>> TSYNC was on the process locking side, and eBPF has had some rather
>>>>>>>>> ugly flaws in the past (and recently: it was nice to be able to say
>>>>>>>>> for Spectre that seccomp filters couldn't be constructed to make
>>>>>>>>> attacks but eBPF could). Adding the complexity needs to be worth the
>>>>>
>>>>> Well, not really. One part of all the Spectre mitigations that went upstream
>>>>> from BPF side was to have an option to remove interpreter entirely and that
>>>>> also relates to seccomp eventually. But other than that an attacker might
>>>>> potentially find as well useful gadgets inside seccomp or any other code
>>>>> that is inside the kernel, so it's not a strict necessity either.
>>>>>
>>>>>>>>> gain. I'm on board for doing it, I just want to be careful. :)
>>>>>>>>
>>>>>>>> Another option might be to remove c/eBPF from the equation all together.
>>>>>>>> c/eBPF allows flexibility and that almost always comes at the cost of
>>>>>>>> additional security risk. Seccomp is for enhanced security yes? How about a
>>>>>>>> new seccomp mode that passes in something like a bit vector or hashmap for
>>>>>>>> "simple" white/black list checks validated by kernel code, versus user
>>>>>>>> provided interpreted code? Of course this removes a fair number of things
>>>>>>>> you can currently do or would be able to do with eBPF. Of course, restated
>>>>>>>> from a security point of view, this removes a fair number of things an
>>>>>>>> _attacker_ can do. Presumably the performance improvement would also be
>>>>>>>> significant.
>>>>>
>>>>> Good luck with not breaking existing applications relying on seccomp out
>>>>> there.
>>>>
>>>> This wasn't in the context of an implementation proposal, but the assumption would be to add this in addition to the old way. Now, does that make sense to do? That is the discussion.
>>
>> I see; didn't read that out from the above when you also mentioned removing
>> cBPF, but fair enough.
>>
>>>>>>>> Is this an idea worth prototyping?
>>>>>>>
>>>>>>> That was the original prototype for seccomp-filter. :) The discussion
>>>>>>> around that from years ago basically boiled down to it being
>>>>>>> inflexible. Given all the things people want to do at syscall time,
>>>>>>> that continues to be true. So true, in fact, that here we are now,
>>>>>>> trying to move to eBPF from cBPF. ;)
>>>>>
>>>>> Right, agree. cBPF is also pretty much frozen these days and aside from
>>>>> that, seccomp/BPF also just uses a proper subset of it. I wouldn't mind
>>>>> doing something similar for eBPF side as long as this is reasonably
>>>>> maintainable and not making BPF core more complex, but most of it can
>>>>> already be set in the verifier anyway based on prog type. Note, that
>>>>> performance of seccomp/BPF is definitely a demand as well which is why
>>>>> people still extend the old remaining cBPF JITs today such that it can
>>>>> be JITed also from there.
>>>>>
>>>>>> I will try to find that discussion. As someone pointed out here though, eBPF is being used by more and more people in areas where security is not the primary concern. Differing objectives will make this a long term continuing issue. We ourselves were looking at eBPF simply as a means to use a hashmap for a white/blacklist, i.e. performance not flexibility.
>>>>>
>>>>> Not really, security of verifier and BPF infra in general is on the top
>>>>> of the list, it's fundamental to the underlying concept and just because
>>>>> it is heavily used also in tracing and networking, it only shows that the
>>>>> concept is highly flexible that it can be applied in multiple areas.
>>>
>>> If you're implying that because seccomp would have it's own verifier and could therefore restrict itself to a subset of eBPF, therefore any future additions/features to eBPF would not necessarily make seccomp less secure, I mainly agree. Is that the argument?
>>
>> Ok, in addition to the current unpriv restrictions imposed by the verifier,
>> what additional requirements would you have from your side in order to get
>> to semantics that make sense for you wrt seccomp/eBPF? Just trying to
>> understand how far we are away from that. Note that not every new feature,
>> map or helper is enabled for every program type of course.
>>
>
> I haven't looked at the exact unpriv restrictions lately, but I think
> I remember them.  Regardless, from my perspective, here's what I would
> care about as a seccomp reviewer:
>
> 1. No extra information should become available to a seccomp filter
> program without seccomp's explicit opt-in.  In other words, a seccomp
> filter program should be able to access the fields in struct
> seccomp_data, the return values of BPF_CALL helpers explicitly
> authorized by seccomp, and the values in maps authorized by seccomp
> (if any!), and that's it.  They should not be able to learn the
> current time, any kernel pointers, user register state (except that
> which is contained in seccomp_data), etc.  I believe that this is
> already the case except insofar as core BPF_CALL helpers may violate
> this.  (I'm not sure exactly what the policy is on the use of BPF_CALL
> helpers.)
>
The only calls that were whitelisted was uid/gid, pid/tid, ktime, and
prandom. There was no printk, nor perf_events. It didn't give access
to maps, or anything else. I have to ask though, why isn't it okay for
eBPF filters to have access to time? My use case is that I launch a
job, and it has 5 minutes to do some privileged operations, and then I
want to deny any privileged operations. Maybe when writeable maps are
a thing, we can do something more advanced, but trying to narrow the
window of attack has a lot of benefit.

> 2. Filter evaluation should have no side effects except as explicitly
> authorized by seccomp or by systemwide tracing.  So perf observing
> that a seccomp filter ran is fine, but seccomp filters should not be
> able to write to the system log, to perf ring buffers, to maps, etc.
>
See above.

> 3. Stability.  If a filter passes verification on two different
> kernels, it should behave the same on both kernels, even if the filter
> is buggy in some theoretical sense.
In the sense that BPF is part of the uapi, and uapi is supposed to be
stable? I think that already makes sense, because for all the places
where eBPF is used for networking, it has to follow this property.

>
> And that's it.  Obviously the attack surface provided by the ability
> to load and run a filter should be minimized, but that's true for eBPF
> in general and has little to do with seccomp in particular.
>
> #1 and #2 are probably fairly straightforward using existing
> mechanisms, unless the BPF_CALL hooks or map authorization hooks for
> program types need to be extended a bit to get it right.  #3 is maybe
> more interesting, but I imagine that XDP and any upcoming bpf-based
> iptables replacement have the same requirement.  In contrast, bpf
> *tracing* doesn't really require #3 to the same extent -- it's really
> such an awful thing if a buggy or otherwise naughty bpf tracing
> program behaves differently after a kernel upgrade.
>
> I suppose that another way of saying this is that an eBPF seccomp
> program should behave like a pure function except to the extent that
> the seccomp core makes an explicit exception.
>
> I'm not terribly concerned about the additional attack surface exposed
> by eBPF itself.  Sure, it's more dangerous to allow a sandboxed
> program to load its own eBPF programs than to allow a sandboxed
> program to load its own cBPF programs, but such is the price of
> progress.  If I'm writing a restrictive sandbox a la chromium's, I'm
> not going to allow it to load eBPF programs, but I can still use eBPF
> to enforce the sandbox policy.
>
> --Andy