[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABqD9hYpiDtCM5gPgjjHOJ6DcFs-HHUORAGi+AMFs1+9EAk1UA@mail.gmail.com>
Date: Tue, 31 Jan 2012 05:04:33 -0600
From: Will Drewry <wad@...omium.org>
To: Indan Zupancic <indan@....nu>
Cc: linux-kernel@...r.kernel.org, keescook@...omium.org,
john.johansen@...onical.com, serge.hallyn@...onical.com,
coreyb@...ux.vnet.ibm.com, pmoore@...hat.com, eparis@...hat.com,
djm@...drot.org, torvalds@...ux-foundation.org,
segoon@...nwall.com, rostedt@...dmis.org, jmorris@...ei.org,
scarybeasts@...il.com, avi@...hat.com, penberg@...helsinki.fi,
viro@...iv.linux.org.uk, luto@....edu, mingo@...e.hu,
akpm@...ux-foundation.org, khilman@...com, borislav.petkov@....com,
amwang@...hat.com, oleg@...hat.com, ak@...ux.intel.com,
eric.dumazet@...il.com, gregkh@...e.de, dhowells@...hat.com,
daniel.lezcano@...e.fr, linux-fsdevel@...r.kernel.org,
linux-security-module@...r.kernel.org, olofj@...omium.org,
mhalcrow@...gle.com, dlaor@...hat.com, corbet@....net,
alan@...rguk.ukuu.org.uk, mcgrathr@...omium.org
Subject: Re: [PATCH v5 2/3] seccomp_filters: system call filtering using BPF
On Mon, Jan 30, 2012 at 7:42 PM, Indan Zupancic <indan@....nu> wrote:
> On Mon, January 30, 2012 23:26, Will Drewry wrote:
>> Do you think something along the lines of 2 kB is sane for a config-less change?
>
> Yes, especially if there is some way to get rid of it anyway,
> like disabling SECCOMP or some option under CONFIG_EMBEDDED.
> But it seems you need at least a hidden config option which
> depends on the stuff you need.
Disabling SECCOMP would definitely do it.
>>
>> Doing exactly that. I've been tinkering with the best way to minimize
>> the impact to the existing BPF evaluator. Right now, I'm adding a
>> very small number of new instructions to the sk_buff specific code
>> path, but I haven't yet benchmarked - just disasssembled.
>
> I would do all the checking in sk_chk_filter(), so you know that when
> you do run the filter, you can't hit the sk_buff paths. This doesn't
> cause any slow down for the networking path.
Ah sorry - I was referring to the intrusion of a load_pointer function
pointer. I want to leave the current networking path as untouched as
possible. For checking, I agree -- a quick change to sk_chk_filter or
even just a small helper function that scans for any codes in the
ancillary range will do the trick.
>> I agree. I will post the next series with a proposed integration. If
>> there is a lot of resistance, then the difference will be going from a
>> 2kB changes to a 3kB change.
>
> Let's see how it goes.
>
>>> I think you should go on a quest to make sure (almost) all archs have that file,
>>> before this patch can be merged. At least the archs that have ptrace support.
>>
>> I'm an idiot. CONFIG_HAVE_ARCH_TRACEHOOK covers asm/syscall.h
>>
>> So I have two choices:
>> 1. allow seccomp with filtering on these platforms by fail if an
>> argument is accessed
>> 2. return ENOSYS when a filter is attempted to be installed on
>> platforms with no tracehook support.
>
> I vote for:
>
> 3. Add tracehook support to all archs.
I don't see these #3 as mutually exclusive :) tracehook requires:
- task_pt_regs() in asm/processor.h or asm/ptrace.h
- arch_has_single_step() if there is hardware single-step support
- arch_has_block_step() if there is hardware block-step support
- asm/syscall.h supplying asm-generic/syscall.h interface
- linux/regset.h user_regset interfaces
- CORE_DUMP_USE_REGSET #define'd in linux/elf.h
-TIF_SYSCALL_TRACE calls tracehook_report_syscall_{entry,exit}
- TIF_NOTIFY_RESUME calls tracehook_notify_resume()
- signal delivery calls tracehook_signal_handler()
> Maybe not all archs, but at least some more. That way, every time someone
> adds something tracehook specific, more archs support it.
Well the other arch I want this on specifically for my purposes is
arm, but someone recently posted a partial asm/syscall.h for arm, but
I didn't see that one go anywhere just yet. (I know syscall_get_nr
can be tricky on arm because it doesn't (didn't) have a way of
tracking in-syscall state.)
ref: https://lkml.org/lkml/2011/12/1/131
> syscall.h has no TRACEHOOK defines or anything though.
Nope - it is just part of what is expected.
> Only syscall_rollback() looks tricky. I have no clue what the difference
> between syscall_get_error() and syscall_get_return_value() is. But you
> only need to add syscall_get_nr() and syscall_[gs]et_arguments(), which
> should be possible for all archs.
It seems even syscall_get_nr can have some wrinkles :)
ref: http://lkml.indiana.edu/hypermail/linux/kernel/0906.3/00096.html
> How many archs don't support tracehook?
14 out of 26. However, 5 of those still have asm/syscall.h
>> I think #2 is the nicest user contract, but #1 allows working filters
>> even on less hospitable arches even if they can't user arguments
>> (yet). I'm coding it up as #2, but it is easy to switch to #1.
>
> If you don't support the arch, don't compile any code at all, and
> let prctl(2) return EINVAL. You don't want to return ENOSYS.
I was thinking of the inline prctl handler, but EINVAL makes sense.
>>> Yeah, I figured that out later on. It's quite nifty, but I find the recursion
>>> within kref_put() slightly worrisome. Perhaps the code would be cleaner if this
>>> was avoided and done differently, but I can't think of a good alternative. I'll
>>> wait for the new version to see if I can find a way.
>>
>> Thanks - sure. Since kref_put is just an atomic_dec_and_test followed
>> by a call to the function pointer, I wasn't too concerned. Without
>> changing how the relationships are handled, I'm not sure how to
>> approach it differently (and still avoid races). IIRC, this isn't much
>> different to how namespaces work, they just use their own atomic
>> counters.
>
> Well, the thing is, this recursion is controlled by user space depending
> on how many filters they have installed. What is preventing them to force
> you out of stack?
Hrm true. The easiest option is to just convert it to iterative by
not using kref_t, but I'll look more closely.
> So perhaps add at least some arbitrary filter limit to avoid this?
Definitely possible -- probably as a sysctl. I'm not quite sure what
number makes sense yet, but I'll look at breaking the recursion first.
Thanks!
>> I'll clarify a bit. My original ptrace integration worked such that a
>> tracer may only intercept syscall failures if it attached prior to the
>> failing filter being installed. I did it this way to avoid using
>> ptrace to escape system call filtering. However, since I don't have
>> that as part of the patch series, it doesn't make sense to keep it. (I
>> tracked a tracer pid_struct in the filters.) If it needs to come back
>> with later patchsets, then it can be tackled then!
>
> The problem of that is that filters can be shared between processes with
> different ptracers. And you have all the hassle of keeping it up to date.
>
> I think seccomp should always come first and just trust ptrace. This
> because it can deny all ptrace() calls for filtered tasks, so the only
> untrusted tasks doing ptrace() are outside of seccomp's filtering control.
> And those could do the same system calls themselves.
>
> The case where there is one task being filtered and allowed to do ptrace,
> but not some other syscall, ptracing another filtered task which isn't
> allowed to do ptrace, but allowed to do that other syscall, is quite far
> fetched I think. If you really want to handle this, then you could run
> the ptracer's filters against the tracee's post-ptrace syscall state.
> This is best done in the ptracer's context, just before continuing the
> system call. (You really want Oleg's SIKILL immediate patch then.)
>
> What about:
>
> 1) Seccomp filters can deny a syscall by killing the task.
>
> 2) Seccomp can deny a syscall and let it return an error value.
>
> I know you're not fond of this one. It's just a performance
> optimisation as sometimes a lot of denied but harmless syscalls
> are called by glibc all the time, like getxattr(). Hardcoding
> the kill policy seems wrong when it can be avoided. If this is
> too hard then skip it, but if it's easy to add then please do.
> I'll take a look at this with your next patch version.
It's easy on x86 harder on other arches. I would suggest saving
changing the __secure_computing signature until after the core
functionality lands, but that's just me.
> 3) Seccomp can allow a syscall to proceed normally.
>
> 4) Seccomp can set a hint to skip ptrace syscall events for this syscall.
> A filter sets this by returning a specific value.
>
> 5) Ptrace always gets a syscall event when it asked for it.
>
> 6) Ptrace can set an option to honour seccomp's hint and to not get all
> syscall events.
>
> This way all seccomp needs to do is to set some flags which ptrace can check.
I like the use of flags/options to trigger ptrace handling. If I were
to stack rank these for pursuit after the core functionality lands,
it'd be to add #6 (and its deps) then #2. With #6, #2 can be
simulated (by having a supervisor that changes the syscall number to
-1), but that is much less ideal than just returning SECCOMP_ERROR
instead of SECCOMP_ALLOW/DENY and letting an error code get bubbled
up.
thanks!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists