linux-kernel - Re: [PATCH v3 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGXu5jJ7nq0VF9tLz=i98eo=DaMKXVNO7NqmtxJHNwVwuZHZGg@mail.gmail.com>
Date:   Fri, 11 Aug 2017 11:32:50 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Tyler Hicks <tyhicks@...onical.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Fabricio Voznika <fvoznika@...gle.com>,
        Andy Lutomirski <luto@...capital.net>,
        Will Drewry <wad@...omium.org>,
        Tycho Andersen <tycho@...ker.com>,
        Shuah Khan <shuah@...nel.org>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        linux-security-module <linux-security-module@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>
Subject: Re: [PATCH v3 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

On Fri, Aug 11, 2017 at 9:58 AM, Tyler Hicks <tyhicks@...onical.com> wrote:
>> @@ -201,8 +203,25 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>>        */
>>       for (; f; f = f->prev) {
>>               u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
>> +             u32 action = cur_ret & SECCOMP_RET_ACTION;
>>
>> -             if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION)) {
>> +             /*
>> +              * In order to distinguish between SECCOMP_RET_KILL and
>> +              * "higher priority" synthetic SECCOMP_RET_KILL_PROCESS
>> +              * identified by the kill_process filter flag, treat any
>> +              * case as immediately stopping filter processing. No
>> +              * higher priority action can exist, and we can't stop
>> +              * on the first RET_KILL (which may not have set
>> +              * f->kill_process) when a RET_KILL further up the filter
>> +              * list may have f->kill_process set which would go
>> +              * unnoticed.
>> +              */
>> +             if (unlikely(action == SECCOMP_RET_KILL && f->kill_process)) {
>> +                     *match = f;
>> +                     return cur_ret;
>> +             }
>
> Why not let the application enforce this via the seccomp filter? In
> other words, the first filter loaded with
> SECCOMP_FILTER_FLAG_KILL_PROCESS set could have a rule in the filter
> that only allows seccomp(2) to be called in the future with the
> SECCOMP_FILTER_FLAG_KILL_PROCESS flag set.

I've been using the guide of "if SECCOMP_RET_KILL_PROCESS _did_ exist,
how would its semantics differ?"

In that magic world, it wouldn't be possible to create a seccomp
filter to screen out SECCOMP_RET_KILL_PROCESS. Also, being able to
distinguish between the two states (see below).

> I understand the reasoning for wanting to enforce this automatically at
> the kernel level but I think mixing return action priorities with filter
> flags could be confusing and inflexible in the long run since filters
> are inherited and your parent's desire to kill the entire thread group
> may not mix with your desire to only kill a single thread.

Blocking the use of SECCOMP_FILTER_FLAG_KILL_PROCESS just means a
child can never perform a KILL_PROCESS, which doesn't really make much
sense, IMO.

The trouble may be that KILL_PROCESS would be used sparingly by either
parent or child, in the sense that maybe "unknown syscall gets
KILL_PROCESS, but 'connect' should just do KILL_THREAD". Or the
reverse. There isn't a way to mix combinations of return values across
filter chains without treating it exactly like a "real"
SECCOMP_RET_KILL_PROCESS would have worked. That means I have to treat
it as "higher priority" in the seccomp_run_filters() loop (which is
luckily very very cheap, as the "unlikely(register == zero)" test is
correct branch-predicted for the non-zero case, and the test is cheap
(we've already done the assignment which we need for the "<" test
below it, so it's a single pipelined instruction for the zero flag).

I don't expect to adjust KILL_THREAD vs KILL_PROCESS ever again, so
I'm not too worried about inflexibility.

What I don't get in this version is a _single_ filter being able to
distinguish between KILL_THREAD and KILL_PROCESS. Userspace is forced
to split up a rule if it wants to have different results. Also, parent
_can_ stop a child from escalating their KILL_THREADs to KILL_PROCESS
via the filter you mentioned, which is weird.

I spent some time trying to use the high bit in the return, to make
this signed, and in the end it was much much more ugly, and I didn't
want to deal with the fallout to userspace which may suddenly have to
deal with unexpected bits in the BPF return:

basically s/u32/s32/ in __seccomp_filter() and seccomp_run_filters().
add #define SECCOMP_RET_ACTION_FULL 0xffff0000
add #define SECCOMP_RET_KILL_PROC      0x80000000

Then use SECCOMP_RET_ACTION_FULL to mask everything (after forcing a u32 cast).

But the more I stare at this, the more I just want a value that that
works correctly without totally crazy flags and things.

> Another way that this doesn't mix perfectly with the existing design is
> when the action is unknown. In that situation, we treat it as RET_KILL.
> However, this patch hard-codes the comparison with RET_KILL so we get
> into this situation where an unknown action is treated as RET_KILL
> except when the filter has the FILTER_FLAG_KILL_PROCESS flag set and
> then this short-circuit doesn't kick in. It is a corner case, for sure,
> but worth mentioning.

Hm, yeah, good point. This leaves unknown returns as KILL_THREAD, not
KILL_PROCESS.

Let me spent some more time looking at the high bit version of this...

-Kees

-- 
Kees Cook
Pixel Security