[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110204170448.GA1808@nowhere>
Date: Fri, 4 Feb 2011 18:04:51 +0100
From: Frederic Weisbecker <fweisbec@...il.com>
To: Eric Paris <eparis@...hat.com>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Stefan Fritsch <sf@...itsch.de>, Ingo Molnar <mingo@...e.hu>,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
linux-kernel@...r.kernel.org, agl@...gle.com, tzanussi@...il.com,
Jason Baron <jbaron@...hat.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
2nddept-manager@....hitachi.co.jp,
Steven Rostedt <rostedt@...dmis.org>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
James Morris <jmorris@...ei.org>
Subject: Re: Using ftrace/perf as a basis for generic seccomp
On Fri, Feb 04, 2011 at 11:29:19AM -0500, Eric Paris wrote:
> On Fri, 2011-02-04 at 15:31 +0100, Peter Zijlstra wrote:
> > On Thu, 2011-02-03 at 20:50 -0500, Eric Paris wrote:
> > > I'm going to try to work on it over
> > > the next week or two.
> >
> > What is your use-case? Going by: http://lwn.net/Articles/332990/ syscall
> > based stuff (seccomp) is broken by design.
>
> My personal goal is very different than an LSM. My goal is to reduce
> attack surface. I'm not trying to implement an LSM. LSM hooks are
> (intentionally) placed in the kernel after object resolution is
> complete. In an LSM we don't check 'open' type operation until after
> the pathname has been converted to an inode. We don't check some
> 'sendto' operations until after the data has been placed into an skb and
> is about to be queued to a socket. There is a LOT of code between
> syscall_entry() and any given LSM hook.
>
> An obvious vulnerability that I'm sure all the people involved here know
> would be the original perf syscall bounds checking vulnerability. If
> I'm dealing with an application that I know will never use perf I'd like
> a way to be able to completely disable the perf syscall and greatly
> reduce the kernel attack surface. It would be almost impossible for an
> LSM to hook between the syscall_enter() and the location of that
> vulnerability in the perf syscall. In my particular case I'm thinking
> about qemu, which never needs to call perf. I want a way to disable all
> of the code after syscall_enter() for huge swaths of the kernel.
>
> What we have today, called "seccomp", is a one way toggle,
> prctl(PR_SET_SECCOMP, 1), which reduces the available syscalls to
> read,write,exit, and sigreturn. Any other syscall results in a process
> being immediately killed. It's a great idea to reduce the attack
> surface of the kernel but it is too inflexible to be useful. I wonder
> if anyone is using it.
>
> Qemu on my box in just a couple of seconds of strace was found to use
> futex, ioctl, read, rt_sigaction, select, timer_gettime, timer_settime,
> and write. I'm sure that other well defined processes have other such
> sort lists of required syscalls. I think a more flexible seccomp which
> lets one remove syscalls from the allowed set (but never add them back)
> can GREATLY reduce the kernel attack surface from malicious processes.
>
> This is not a sandbox. This is not an LSM replacement. This is a per
> syscall cutoff. It can be used to help build a stronger sandbox. I'll
> likely see if this can't be used by the SELinux sandbox which already
> uses the LSM hooks to control information flow and mediate access. But
> SELinux does not control the sheer amount of the kernel code that can be
> executed. I believe we can build a stronger sandbox using a flexible
> seccomp as one of the tools. All we have to do is find one
> vulnerability in the code between the syscall entry and a LSM hook which
> would deny to operation to see the value in a per syscall control
> mechanism.
>
> As to doing it in seccomp code where it's all of a syscall or none vs
> making use of the filter infrastructure to allow even more fine grained
> control over the syscall is a question. I'm leaning more towards just
> doing it in seccomp. We can't ever build a full and complete strong
> sandbox using the filter code. James' assertions about copy_from_user()
> are obviously correct. A chat with PeterZ privately on IRC indicated
> that he also was not interested in seeing this creep into the tracing
> code.
Note it's not about tracing here. It's about abstracting some tracing
features to make them standalone and usable outside tracing.
But yeah, now that I consider the fact that checks on pointers are
racy until objects are resolved (got my first security lesson), such
deep filtering up to dereferencing pointers is then pointless.
Now there are still immediate values for which there is still a point
(filtering fd, filtering opening mode, etc...).
> Do we have a user that can articulate a need for greater
> flexibility in their use of such a hardening tool?
So yeah, indeed we probably need to get more usecases to consider it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists