[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABqD9hYza5BpOk-+n0svHVGuWem39M=asGTMPy0z1ke0rCv8hA@mail.gmail.com>
Date: Thu, 12 Jan 2012 11:35:55 -0600
From: Will Drewry <wad@...omium.org>
To: Jamie Lokier <jamie@...reable.org>
Cc: Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
keescook@...omium.org, john.johansen@...onical.com,
serge.hallyn@...onical.com, coreyb@...ux.vnet.ibm.com,
pmoore@...hat.com, eparis@...hat.com, djm@...drot.org,
torvalds@...ux-foundation.org, segoon@...nwall.com,
jmorris@...ei.org, scarybeasts@...il.com, avi@...hat.com,
penberg@...helsinki.fi, viro@...iv.linux.org.uk, luto@....edu,
mingo@...e.hu, akpm@...ux-foundation.org, khilman@...com,
borislav.petkov@....com, amwang@...hat.com, oleg@...hat.com,
ak@...ux.intel.com, eric.dumazet@...il.com, gregkh@...e.de,
dhowells@...hat.com, daniel.lezcano@...e.fr,
linux-fsdevel@...r.kernel.org,
linux-security-module@...r.kernel.org, olofj@...omium.org,
mhalcrow@...gle.com, dlaor@...hat.com
Subject: Re: [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF
On Thu, Jan 12, 2012 at 11:22 AM, Jamie Lokier <jamie@...reable.org> wrote:
> Will Drewry wrote:
>> On Thu, Jan 12, 2012 at 9:43 AM, Steven Rostedt <rostedt@...dmis.org> wrote:
>> > On Wed, 2012-01-11 at 11:25 -0600, Will Drewry wrote:
>> >
>> >> Filter programs may _only_ cross the execve(2) barrier if last filter
>> >> program was attached by a task with CAP_SYS_ADMIN capabilities in its
>> >> user namespace. Once a task-local filter program is attached from a
>> >> process without privileges, execve will fail. This ensures that only
>> >> privileged parent task can affect its privileged children (e.g., setuid
>> >> binary).
>> >
>> > This means that a non privileged user can not run another program with
>> > limited features? How would a process exec another program and filter
>> > it? I would assume that the filter would need to be attached first and
>> > then the execv() would be performed. But after the filter is attached,
>> > the execv is prevented?
>>
>> Yeah - it means tasks can filter themselves, but not each other.
>> However, you can inject a filter for any dynamically linked executable
>> using LD_PRELOAD.
>>
>> > Maybe I don't understand this correctly.
>>
>> You're right on. This was to ensure that one process didn't cause
>> crazy behavior in another. I think Alan has a better proposal than
>> mine below. (Goes back to catching up.)
>
> You can already use ptrace() to cause crazy behaviour in another
> process, including modifying registers arbitrarily at syscall entry
> and exit, aborting and emulating syscalls.
>
> ptrace() is quite slow and it would be really nice to speed it up,
> especially for trapping a small subset of syscalls, or limiting some
> kinds of access to some file descriptors, while everything else runs
> at normal speed.
>
> Speeding up ptrace() with BPF filters would be a really nice. Not
> that I like ptrace(), but sometimes it's the only thing you can rely on.
>
> LD_PRELOAD and code running in the target process address space can't
> always be trusted in some contexts (e.g. the target process may modify
> the tracing code or its data); whereas ptrace() is pretty complete and
> reliable, if ugly.
>
> There's already a security model around who can use ptrace(); speeding
> it up needn't break that.
>
> If we'd had BPF ptrace in the first place, SECCOMP wouldn't have been
> needed as userspace could have done it, with exactly the restrictions
> it wants. Google's NaCl comes to mind as a potential user.
That's not entirely true. ptrace supervisors are subject to races and
always fail open. This makes them effective but not as robust as a
seccomp solution can provide.
With seccomp, it fails close. What I think would make sense would be
to add a user-controllable failure mode with seccomp bpf that calls
tracehook_ptrace_syscall_entry(regs). I've prototyped this and it
works quite well, but I didn't want to conflate the discussions.
Using ptrace() would also mean that all consumers of this interface
would need a supervisor, but with seccomp, the filters are installed
and require no supervisors to stick around for when failure occurs.
Does that make sense?
thanks!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists