[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTim16KCwq__L=3OEsDkvQe3yojtc5Q@mail.gmail.com>
Date: Fri, 6 May 2011 18:58:17 -0700
From: Will Drewry <wad@...omium.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Frederic Weisbecker <fweisbec@...il.com>,
Eric Paris <eparis@...hat.com>, Ingo Molnar <mingo@...e.hu>,
linux-kernel@...r.kernel.org, kees.cook@...onical.com,
agl@...omium.org, jmorris@...ei.org,
Randy Dunlap <rdunlap@...otime.net>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Tom Zanussi <tzanussi@...il.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 5/7] seccomp_filter: Document what seccomp_filter is and
how it works.
On Fri, May 6, 2011 at 4:53 AM, Steven Rostedt <rostedt@...dmis.org> wrote:
> On Thu, 2011-05-05 at 02:21 -0700, Will Drewry wrote:
>
>> In particular, if the userspace code wants to stage some filters and
>> apply them all at once, when ready, I'm not sure that it makes sense
>> to me to put that complexity in the kernel itself. For instance,
>> Eric's second sample showed a call that took an array of ints and
>> coalesced them into "fd == %d || ...". That simple example shows that
>> we could easily get by with a pretty minimal kernel-supported
>> interface as long as the richer behavior could live userspace side --
>> even if just in a simple helper library. It'd be pretty easy to
>> implement a userspace library that exposed add_filter(syscall_nr,
>> filter) and apply_filters() such that it could manage building the
>> final filter string for a given syscall and pushing it to prctl on
>> apply.
>
> I'm fine with a single kernel call and the "temporary filter" be done in
> userspace. Making the kernel code less complex is better :)
>
>>
>> I think that could also help simplify the primitives. For instance,
>> if any separate SET called on a system call resulting in an &&
>> operation, then the behavior could be consistent prior to enforcement
>> of the filtering and after. E.g.,
>> SET, __NR_read, "fd == 1"
>> SET, __NR_read, "len < 4097"
>> would result in an evaluated "fd == 1 && len < 4097". It would do so
>> after a single APPLY call too:
>> SET, __NR_read, "1"
>> APPLY
>> SET, __NR_read, "fd == 1"
>> SET, __NR_read, "len < 4097"
>> Results in: "1 && fd == 1 && len < 4097", and SET, nr, "0" would
>> nullify the syscall filter in total.
>
> Only that that was not applied? We can't let tasks nullify their
> restrictions once they have been applied. This keeps the kernel code
> simpler.
Ah - so I really need to be more explicit when discussing these
things! In the "simplification" effort, I was thinking any syscall
with no entry has a "0" rule. So if if nullify it, it becomes a
complete block and if you can't OR, then you can't add permissions.
>> It seems like that would be
>> enough to build the SET-SET-...-APPLY, SET-SET-...-SET-APPLY logic
>> into a userspace library so that all temporary unapplied state doesn't
>> have to be explicitly managed by the kernel.
>
> Thus, the SETs are done in the userspace library that does not need to
> interact with the kernel (besides perhaps allocating memory). Then the
> apply would send all the filters to the kernel which would restrict the
> task (or the task on exec) further.
Exactly. Smaller patch and less state per-filter entry (I hope!).
>>
>> While I completely agree with the comment around ease-of-use as being
>> key to security, I also find that the more the state diagram explodes,
>> the harder it is to feel confident that a solution is actually secure.
>> To try to achieve both objectives, I'd like to limit the kernel
>> interface to the bare minimum of primitives and build any API
>> fanciness into userspace.
>
> Fair enough.
>
>>
>> Does it seem that the tradeoff isn't worth it, or are there some
>> specific behaviors that aren't addressed using that model?
>>
>> While writing that, another option occurred to me that touches on the
>> other proposals but makes the behaviors much more explicit.
>> A prctl prototype could be provided:
>> prctl(<SET|GET>, <AND|OR>, <syscall_nr>, <filter string>)
>> e.g.,
>> prctl(PR_SET_SECCOMP_FILTER, PR_SECCOMP_FILTER_OR, __NR_read, "fd == 2");
>>
>> The explicit prctl argument list would allow the filter strings to be
>> self-referential and allow the userspace app to decide what behaviors
>> are allowed and when. If we followed that route, all implicit filters
>> would be "0" and the initial call to get things started might be:
>> #define SET 33
>> #define OR 0
>> #define AND 1
>> SET, OR, __NR_prctl, "option == 33 && (arg1 == 0 || arg1 == 1)"
>> prctl(PR_SET_SECCOMP, 2);
>>
>> So now the "locked down" binary can call prctl to set an OR or AND
>> filter for any syscall. A subsequent call could change that:
>> SET, OR, __NR_read, "fd == 2" /* => "0 || fd == 2" */
>> SET, AND, __NR_prctl, "(arg2 != 63 || arg1 != 0)" /* __NR_read == 63 */
>>
>> This would OR in a __NR_read filter, then disallow a future call to
>> prctl to OR in more NR_read filters, but for other syscalls ANDing and
>> ORing is still possible until you pass in something like:
>>
>> SET, AND, __NR_prctl, "arg1 == 1"
>>
>> which would lock down all future prctl calls to only ANDing filters
>> in. (The numbers in the examples could then be properly managed in a
>> userspace library to ensure platform correctness.)
>
> I don't know about this. It seems to be starting to get too complex, and
> thus error prone. Is there any reason we should allow an OR to the task?
> Why would we want to restrict a task where the task could easily
> unrestrict itself?
No idea! I can't think of any good examples where you'd want to do
it, just contrived ones. In general, I think the above approach would
rarely be used since I expect that something like 80% of the places
where this will be used will just be one-time, upfront filter installs
without any surface reduction after the fact.
That said, if there's no reason to support OR after the fact, then the
interface can just _only_ support &&s and leave the installation to
userspace. It might makes the multiple-fd-ORing case less fun in
userspace, but it should work for most cases I think.
>>
>> While this would reduce the primitives a bit further, I'm not sure if
>> this would be the right approach either, but it would open the door to
>> pushing even more down to userspace very explicitly and further
>> removing magic policy logic from the kernel-side. Is this vaguely
>> interesting or just another layer of confusing-ness?
>
> I'm confused, thus I must have hit that layer ;)
Sounds like it. I'm always a sucker for self-referential mechanisms.
I've been travelling a bit recently so my code output has been a bit
low, but I'll pull together the most minimal approach that I think
we've been iterating toward and hopefully post something in the not
too distant future.
thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists