linux-kernel - Re: Using ftrace/perf as a basis for generic seccomp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110203191846.GD1769@nowhere>
Date:	Thu, 3 Feb 2011 20:18:47 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Eric Paris <eparis@...hat.com>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Eric Paris <eparis@...isplace.org>,
	linux-kernel@...r.kernel.org, agl@...gle.com, tzanussi@...il.com,
	Jason Baron <jbaron@...hat.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	2nddept-manager@....hitachi.co.jp,
	Steven Rostedt <rostedt@...dmis.org>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Using ftrace/perf as a basis for generic seccomp

On Thu, Feb 03, 2011 at 08:06:45PM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 02, 2011 at 11:45:22AM -0500, Eric Paris wrote:
> > On Wed, 2011-02-02 at 13:26 +0100, Ingo Molnar wrote:
> > > * Masami Hiramatsu <masami.hiramatsu.pt@...achi.com> wrote:
> > > 
> > > > Hi Eric,
> > > > 
> > > > (2011/02/01 23:58), Eric Paris wrote:
> > > > > On Wed, Jan 12, 2011 at 4:28 PM, Eric Paris <eparis@...hat.com> wrote:
> > > > >> Some time ago Adam posted a patch to allow for a generic seccomp
> > > > >> implementation (unlike the current seccomp where your choice is all
> > > > >> syscalls or only read, write, sigreturn, and exit) which got little
> > > > >> traction and it was suggested he instead do the same thing somehow using
> > > > >> the tracing code:
> > > > >> http://thread.gmane.org/gmane.linux.kernel/833556
> > > > 
> > > > Hm, interesting idea :)
> > > > But why would you like to use tracing code? just for hooking?
> > > 
> > > What I suggested before was to reuse the scripting engine and the tracepoints.
> > > 
> > > I.e. the "seccomp restrictions" can be implemented via a filter expression - and the 
> > > scripting engine could be generalized so that such 'sandboxing' code can make use of 
> > > it.
> > > 
> > > For example, if you want to restrict a process to only allow open() syscalls to fd 4 
> > > (a very restrictive sandbox), it could be done via this filter expression:
> > > 
> > > 	'fd == 4'
> > > 
> > > etc. Note that obviously the scripting engine needs to be abstracted out somewhat - 
> > > but this is the basic idea, to reuse the callbacks and reuse the scripting engine 
> > > for runtime filtering of syscall parameters.
> > 
> > Any pointers on what is involved in this abstraction?  I can work out
> > the details, but I don't know the big picture well enough to even start
> > to move forwards.....
> 
> In the big picture, the filtering code is very tight to the tracing code.
> Creation, initialization, removal of filters is all made on top of the
> trace events structures (struct ftrace_event_call) because we apply and
> interpret filters on the fields of trace events, which are what we save
> in a trace.
> 
> Example:
> 
> If you look at the sched switch trace events, we have several fields
> like prev_comm and next_comm. These are defined in the TRACE_EVENT()
> macros calls. So when we apply a filter like "prev_comm == firefox-bin",
> we enter the filtering code with the trace_event structure for sched
> switch events and iterate through its fields to find one called
> prev_comm and then we work on top of that.
> I think you won't work with trace events, so you need to make the
> filtering code more tracing-agnostic.
> 
> But I think it's quite workable and shouldn't be too hard to split that
> into a filtering backend. Many parts are already pretty standalone.
> 
> Also I suspect the tracepoints are not what you need. Or may be
> they are. But as Masami said, the syscall tracepoint is called late.
> It's workable though. The other problem is that preemption is disabled
> when tracepoints are called, which is probably not what you want.
> One day I think we'll need to unify the tracepoints and notifier
> code but until then, better keep tracepoints for tracing.
> 
> Now once you have the filtering code more generic, you still
> need an arch backend to map register contents and layout into syscall
> arguments name and type. On top of which you can finally use the filtering
> code. For that you can use, again, some code we use for tracing, which
> are syscalls metadata: informations generated on build time
> that have syscalls fields and type.
> And that also needs to be split up, but it's more trivial
> than the filtering part.
> 
> Note for now, filtering + syscalls metadata only works on top
> of raw arguments value. Syscalls metadata don't know much
> about type semantics and won't help you to dereference
> syscall argument pointers. Only raw syscall parameter values.
> Similarly, the filtering code can't evaluate pointer dereferencing
> expression evaluation, only direct values comprehension.

Actually we have string comparison supported by the filtering code.
Still we need safe accessors (copy_from_user()) from filtering code
to use that safely on syscall parameters.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/