linux-kernel - Re: [PATCH 5/7] seccomp_filter: Document what seccomp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1304002571.2101.38.camel@localhost.localdomain>
Date:	Thu, 28 Apr 2011 10:56:09 -0400
From:	Eric Paris <eparis@...hat.com>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Will Drewry <wad@...omium.org>, linux-kernel@...r.kernel.org,
	kees.cook@...onical.com, agl@...omium.org, jmorris@...ei.org,
	rostedt@...dmis.org, Randy Dunlap <rdunlap@...otime.net>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tom Zanussi <tzanussi@...il.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 5/7] seccomp_filter: Document what seccomp_filter is and
 how it works.

On Thu, 2011-04-28 at 09:06 +0200, Ingo Molnar wrote:
> * Will Drewry <wad@...omium.org> wrote:
> 
> > +A collection of filters may be supplied via prctl, and the current set of
> > +filters is exposed in /proc/<pid>/seccomp_filter.
> > +
> > +For instance,
> > +  const char filters[] =
> > +    "sys_read: (fd == 1) || (fd == 2)\n"
> > +    "sys_write: (fd == 0)\n"
> > +    "sys_exit: 1\n"
> > +    "sys_exit_group: 1\n"
> > +    "on_next_syscall: 1";
> > +  prctl(PR_SET_SECCOMP, 2, filters);
> > +
> > +This will setup system call filters for read, write, and exit where reading can
> > +be done only from fds 1 and 2 and writing to fd 0.  The "on_next_syscall" directive tells
> > +seccomp to not enforce the ruleset until after the next system call is run.  This allows
> > +for launchers to apply system call filters to a binary before executing it.
> > +
> > +Once enabled, the access may only be reduced.  For example, a set of filters may be:
> > +
> > +  sys_read: 1
> > +  sys_write: 1
> > +  sys_mmap: 1
> > +  sys_prctl: 1
> > +
> > +Then it may call the following to drop mmap access:
> > +  prctl(PR_SET_SECCOMP, 2, "sys_mmap: 0");
> 
> Ok, color me thoroughly impressed

Me too!

> I've Cc:-ed Linus and Andrew: are you guys opposed to such flexible, dynamic 
> filters conceptually? I think we should really think hard about the actual ABI 
> as this could easily spread to more applications than Chrome/Chromium.

I'll definitely port QEMU to use this new interface rather than my more
rigid flexible (haha "rigid flexible") seccomp.  I'll see if I run into
any issues with this ABI in that porting...

> Btw., i also think that such an approach is actually the sane(r) design to 
> implement security modules: using such filters is far more flexible than the 
> typical LSM approach of privileged user-space uploading various nasty objects 
> into kernel space and implementing silly (and limited and intrusive) hooks 
> there, like SElinux and the other security modules do.

Then you are wrong.  There's no question that this interface can provide
great extensions to the current discretionary functionality provided by
legacy security controls but if you actually want to mediate what tasks
can do to other tasks or can do to arbitrary objects on the system this
doesn't cut it.  Every system call that takes or uses a structure as an
argument or that uses copy_from_user (for something other than just
unparsed data) is uncontrollable.

This approach is great and with careful coding of userspace apps can be
made very useful in constraining those apps, but a replacement for
mandatory access control it is not.

> This approach also has the ability to become recursive (gets inherited by child 
> tasks, which could add their own filters) and unprivileged - unlike LSMs.

LSMs have that ability.  There's nothing to prevent a module loading
service to allow unpriv applications to further constrain themselves.
It's just the different between DAC and MAC.  You are clearly a DAC guy,
and there is no question this change is great in that mindset,  but you
don't seem to understand either the flexibility of the LSM or the
purpose of some of the modules implemented on top of the LSM.

> I like this *a lot* more than any security sandboxing approach i've seen 
> before.

I like this *a lot*.  It will be a HUGE addition to the security
sandboxing approaches I've seen before.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/