[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20120406140503.10b75c5b.akpm@linux-foundation.org>
Date: Fri, 6 Apr 2012 14:05:03 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Kees Cook <keescook@...omium.org>
Cc: Will Drewry <wad@...omium.org>, linux-kernel@...r.kernel.org,
linux-security-module@...r.kernel.org, linux-arch@...r.kernel.org,
linux-doc@...r.kernel.org, kernel-hardening@...ts.openwall.com,
netdev@...r.kernel.org, x86@...nel.org, arnd@...db.de,
davem@...emloft.net, hpa@...or.com, mingo@...hat.com,
oleg@...hat.com, peterz@...radead.org, rdunlap@...otime.net,
mcgrathr@...omium.org, tglx@...utronix.de, luto@....edu,
eparis@...hat.com, serge.hallyn@...onical.com, djm@...drot.org,
scarybeasts@...il.com, indan@....nu, pmoore@...hat.com,
corbet@....net, eric.dumazet@...il.com, markus@...omium.org,
coreyb@...ux.vnet.ibm.com, jmorris@...ei.org
Subject: Re: [PATCH v17 08/15] seccomp: add system call filtering using BPF
On Fri, 6 Apr 2012 13:44:43 -0700
Kees Cook <keescook@...omium.org> wrote:
> On Fri, Apr 6, 2012 at 1:23 PM, Andrew Morton <akpm@...ux-foundation.org> wrote:
> > On Thu, 29 Mar 2012 15:01:53 -0500
> > Will Drewry <wad@...omium.org> wrote:
> >
> >> [This patch depends on luto@....edu's no_new_privs patch:
> >> https://lkml.org/lkml/2012/1/30/264
> >> included in this series for ease of consumption.
> >> ]
> >>
> >> This patch adds support for seccomp mode 2. Mode 2 introduces the
> >> ability for unprivileged processes to install system call filtering
> >> policy expressed in terms of a Berkeley Packet Filter (BPF) program.
> >> This program will be evaluated in the kernel for each system call
> >> the task makes and computes a result based on data in the format
> >> of struct seccomp_data.
> >> ...
> >> +static void seccomp_filter_log_failure(int syscall)
> >> +{
> >> + int compat = 0;
> >> +#ifdef CONFIG_COMPAT
> >> + compat = is_compat_task();
> >> +#endif
> >
> > hm, I'm surprised that we don't have a zero-returning implementation of
> > is_compat_task() when CONFIG_COMPAT=n. Seems silly. Blames Arnd.
>
> There is
I can't find it. The definition in include/linux/compat.h is inside
#ifdef CONFIG_COMPAT.
> >> +static long seccomp_attach_filter(struct sock_fprog *fprog)
> >> +{
> >> + struct seccomp_filter *filter;
> >> + unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
> >> + unsigned long total_insns = fprog->len;
> >> + long ret;
> >> +
> >> + if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
> >> + return -EINVAL;
> >> +
> >> + for (filter = current->seccomp.filter; filter; filter = filter->prev)
> >> + total_insns += filter->len + 4; /* include a 4 instr penalty */
> >
> > So tasks don't share filters? We copy them by value at fork? Do we do
> > this at vfork() too?
>
> The filter chain is shared (and refcounted).
So what's the locking rule for accessing and modifying that
singly-linked list?
> ...
> >> +/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
> >> +void put_seccomp_filter(struct task_struct *tsk)
> >> +{
> >> + struct seccomp_filter *orig = tsk->seccomp.filter;
> >> + /* Clean up single-reference branches iteratively. */
> >> + while (orig && atomic_dec_and_test(&orig->usage)) {
> >> + struct seccomp_filter *freeme = orig;
> >> + orig = orig->prev;
> >> + kfree(freeme);
> >> + }
> >> +}
> >
> > So if one of the filters in the list has an elevated refcount, we bail
> > out on the remainder of the list. Seems odd.
>
> This so that every filter in the list doesn't need to have their
> refcount raised. As long as the counting up matching the counting
> down, it's fine. This allows for process trees branching the filter
> list at different times still being safe. IIUC, this code was based on
> how namespace refcounting is handled. I spent some time proving to
> myself that it was correctly refcounted a while back. More eyes is
> better, of course. :)
Please ensure that future readers of this code have a description of
how it is supposed to work.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists