lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 2 Feb 2011 18:55:56 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Eric Paris <eparis@...hat.com>, Tom Zanussi <tzanussi@...il.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Li Zefan <lizf@...fujitsu.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>
Cc:	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Eric Paris <eparis@...isplace.org>,
	linux-kernel@...r.kernel.org, agl@...gle.com, fweisbec@...il.com,
	tzanussi@...il.com, Jason Baron <jbaron@...hat.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	2nddept-manager@....hitachi.co.jp
Subject: Re: Using ftrace/perf as a basis for generic seccomp


* Eric Paris <eparis@...hat.com> wrote:

> On Wed, 2011-02-02 at 13:26 +0100, Ingo Molnar wrote:
> > * Masami Hiramatsu <masami.hiramatsu.pt@...achi.com> wrote:
> > 
> > > Hi Eric,
> > > 
> > > (2011/02/01 23:58), Eric Paris wrote:
> > > > On Wed, Jan 12, 2011 at 4:28 PM, Eric Paris <eparis@...hat.com> wrote:
> > > >> Some time ago Adam posted a patch to allow for a generic seccomp
> > > >> implementation (unlike the current seccomp where your choice is all
> > > >> syscalls or only read, write, sigreturn, and exit) which got little
> > > >> traction and it was suggested he instead do the same thing somehow using
> > > >> the tracing code:
> > > >> http://thread.gmane.org/gmane.linux.kernel/833556
> > > 
> > > Hm, interesting idea :)
> > > But why would you like to use tracing code? just for hooking?
> > 
> > What I suggested before was to reuse the scripting engine and the tracepoints.
> > 
> > I.e. the "seccomp restrictions" can be implemented via a filter expression - and the 
> > scripting engine could be generalized so that such 'sandboxing' code can make use of 
> > it.
> > 
> > For example, if you want to restrict a process to only allow open() syscalls to fd 4 
> > (a very restrictive sandbox), it could be done via this filter expression:
> > 
> > 	'fd == 4'
> > 
> > etc. Note that obviously the scripting engine needs to be abstracted out somewhat - 
> > but this is the basic idea, to reuse the callbacks and reuse the scripting engine 
> > for runtime filtering of syscall parameters.
> 
> Any pointers on what is involved in this abstraction?  I can work out
> the details, but I don't know the big picture well enough to even start
> to move forwards.....

perf has support for these filters, so would it work with you if I gave you some 
example usage?

First you identify an interesting tracepoint - look at the list of:

   perf list | grep Tracepoint

Say we want to filter sys_close() events, so we pick:

  syscalls:sys_enter_close                     [Tracepoint event]

And record all sys_open (enter) events in the system, for one second:

   perf record -e syscalls:sys_enter_close -a sleep 1

All the recorded data will be in perf.data in cwd.

'perf report' will show a profile, and 'perf script' will show the trace output:

            perf-30558 [002] 117691.065243: sys_enter_close: fd: 0x00000016
            perf-30558 [002] 117691.065406: sys_enter_close: fd: 0x00000016
            perf-30558 [002] 117691.065443: sys_enter_close: fd: 0x00000017
            perf-30558 [002] 117691.065444: sys_enter_close: fd: 0x00000016
            [...]

Now, to record a 'filtered' event, use the --filter parameter when recording:

Available field names can be found in the 'format' file:

 cat /debug/tracing/events/syscalls/sys_close_enter/format 

 name: sys_enter_close
 ID: 402
 format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;
	field:int common_lock_depth;	offset:8;	size:4;	signed:1;

	field:int nr;	offset:12;	size:4;	signed:1;
	field:unsigned int fd;	offset:16;	size:8;	signed:0;

 print fmt: "fd: 0x%08lx", ((unsigned long)(REC->fd))

The interesting ones is:

	field:unsigned int fd;	offset:16;	size:8;	signed:0;

This is the field that represents the fd of the close(fd) call. To filter it, simply 
use it symbolically:

   perf record -e syscalls:sys_enter_close --filter 'fd==3' ./hackbench 5

As you can see it in 'perf script' output:

       hackbench-30576 [008] 117802.180002: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222056: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222064: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222065: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222067: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222069: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222070: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222071: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222073: sys_enter_close: fd: 0x00000003

Only fd==3 events were recorded.

The filter expression engine executes in the kernel, when the event happens. The 
user-space perf tool parses the --filter parameter and passes it to the kernel as a 
string in essence. The kerner parses this into atomic predicaments which are linked 
to the event structure. When the event happens the predicaments are executed by the 
filter engine.

The expressions are simple, but rather flexible, so you can do 'fd==0||fd==1' and 
more complex expressions, etc. The engine could also be extended.

The kernel code is mostly in kernel/trace/trace_events_filter.c.

I've Cc:-ed Tom, Frederic, Steve, Li Zefan and Arnaldo who have worked on the filter 
engine, in case something is broken with this functionality or if there are other 
questions :)

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ