lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXu5jLdw+nmjSSm9E=fanbe3aLjwgjw38WAWXzP6YwSc2D5+A@mail.gmail.com>
Date:	Wed, 9 Mar 2016 13:10:48 -0800
From:	Kees Cook <keescook@...omium.org>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Chris Metcalf <cmetcalf@...lanox.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Christoph Lameter <cl@...ux.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Ingo Molnar <mingo@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Tejun Heo <tj@...nel.org>,
	Gilad Ben Yossef <giladb@...hip.com>,
	Will Deacon <will.deacon@....com>,
	Rik van Riel <riel@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
	Catalin Marinas <catalin.marinas@....com>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v10 09/12] arch/x86: enable task isolation functionality

On Wed, Mar 9, 2016 at 12:58 PM, Andy Lutomirski <luto@...capital.net> wrote:
> On Tue, Mar 8, 2016 at 12:40 PM, Chris Metcalf <cmetcalf@...lanox.com> wrote:
>> On 03/07/2016 03:55 PM, Andy Lutomirski wrote:
>>>>>
>>>>> Let task isolation users who want to detect when they screw up and do
>>>>> >>a syscall do it with seccomp.
>>>>
>>>>
>>>> >Can you give me more details on what you're imagining here?  Remember
>>>> >that a key use case is that these applications can remove the syscall
>>>> >prohibition voluntarily; it's only there to prevent unintended uses
>>>> >(by third party libraries or just straight-up programming bugs).
>>>> >As far as I can tell, seccomp does not allow you to go from "less
>>>> >permissive" to "more permissive" settings at all, which means that as
>>>> >it exists, it's not a good solution for this use case.
>>>> >
>>>> >Or were you thinking about a new seccomp API that allows this?
>>>
>>> I was.  This is at least the second time I've wanted a way to ask
>>> seccomp to allow a layer to be removed.
>>
>>
>> Andy,
>>
>> Please take a look at this draft patch that intends to enable seccomp
>> as something that task isolation can use.
>
> Kees, this sounds like it may solve your self-instrumentation problem.
> Want to take a look?

Errrr... I'm pretty uncomfortable with this. I really would like to
keep the basic semantics of seccomp is simple as possible: filtering
only gets more restricted.

This doesn't really solve my self-instrumentation desires since I
still can't sanely deliver signals. I would need a lot more
convincing. :)

>
>>
>> The basic idea is to add a notion of "removable" seccomp filters.
>> You can tag a filter that way when installing it (using the new
>> SECCOMP_FILTER_FLAG_REMOVABLE flag bit for SECCOMP_SET_MODE_FILTER),
>> and if the most recently-added filter is marked as removable, you can
>> remove it with the new SECCOMP_POP_FILTER operation.  It is currently
>> implemented to be incompatible with SECCOMP_FILTER_FLAG_TSYNC, which
>> is plausible since the obvious use is for thread-local push and pop,
>> but the API allows for future implementation by including a flag word
>> with the pop_filter operation (now always zero).
>>
>> I did not make this supported via the prctl() since the "removable"
>> flag requires seccomp(), so making pop work with prctl() seemed silly.
>>
>> One interesting result of this is that now it is no longer true
>> that once current->seccomp.mode becomes non-zero, it may not be
>> changed, since it can now be changed back to DISABLED when you push a
>> removable filter and later pop it.
>>
>> My preference would be not to have to require all task-isolation users
>> to also figure out all the complexities of creating BPF programs, so
>> my intention is to have task isolation automatically generate a BPF
>> program (just allowing prctl/exit/exit_group and failing everything
>> else with SIGSYS).  To support having it work this way, I open up
>> the seccomp stuff a little so that kernel clients can effectively
>> push/pop a BPF program into seccomp:
>
> That sounds like a great use case for the new libtaskisolation that
> someone is surely writing :)
>
>>
>> long seccomp_push_filter(unsigned int flags, struct bpf_prog *fp)
>> long seccomp_pop_filter(unsigned int flags);
>>
>> We mark filters from this API with a new "extern_prog" boolean in the
>> seccomp_filter struct so the BPF program isn't freed when the
>> seccomp_filter itself is freed.  Note that doing it this way avoids
>> having to go through the substantial overhead of creating a brand-new
>> BPF filter every time we enter task isolation mode.
>>
>> Not shown here is the additional code needed in task isolation to
>> create a suitable BPF program and then push and pop it as we go in and
>> out of task isolation mode.
>>
>> For what it's worth, I'm a little dubious about the tradeoff of adding
>> a substantial chunk of code to seccomp to handle what the v10 task
>> isolation code did with a single extra TIF flag test and a dozen lines
>> of code that got called.  But given that you said there were other
>> potential users for the "filter pop" idea, it may indeed make sense.

I think the extra TIF flag makes more sense than overloading seccomp.

-Kees

>>
>> This is still all untested, but I wanted to get your sense of whether
>> this was even going in the right direction before spending more time
>> on it.
>>
>> Thanks!
>>
>> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> index 2296e6b2f690..feeba7a23d20 100644
>> --- a/include/linux/seccomp.h
>> +++ b/include/linux/seccomp.h
>> @@ -3,13 +3,15 @@
>>   #include <uapi/linux/seccomp.h>
>>  -#define SECCOMP_FILTER_FLAG_MASK      (SECCOMP_FILTER_FLAG_TSYNC)
>> +#define SECCOMP_FILTER_FLAG_MASK       \
>> +       (SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_REMOVABLE)
>>   #ifdef CONFIG_SECCOMP
>>   #include <linux/thread_info.h>
>>  #include <asm/seccomp.h>
>>  +struct bpf_prog;
>>  struct seccomp_filter;
>>  /**
>>   * struct seccomp - the state of a seccomp'ed process
>> @@ -41,6 +43,8 @@ static inline int secure_computing(void)
>>   extern u32 seccomp_phase1(struct seccomp_data *sd);
>>  int seccomp_phase2(u32 phase1_result);
>> +long seccomp_push_filter(unsigned int flags, struct bpf_prog *fp);
>> +long seccomp_pop_filter(unsigned int flags);
>>  #else
>>  extern void secure_computing_strict(int this_syscall);
>>  #endif
>> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
>> index 0f238a43ff1e..6e65ac2a7262 100644
>> --- a/include/uapi/linux/seccomp.h
>> +++ b/include/uapi/linux/seccomp.h
>> @@ -13,9 +13,11 @@
>>  /* Valid operations for seccomp syscall. */
>>  #define SECCOMP_SET_MODE_STRICT        0
>>  #define SECCOMP_SET_MODE_FILTER        1
>> +#define SECCOMP_POP_FILTER     2
>>   /* Valid flags for SECCOMP_SET_MODE_FILTER */
>>  #define SECCOMP_FILTER_FLAG_TSYNC      1
>> +#define SECCOMP_FILTER_FLAG_REMOVABLE  2
>>   /*
>>   * All BPF programs must return a 32-bit value.
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index 15a1795bbba1..c22eb3a56556 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -41,8 +41,9 @@
>>   *         outside of a lifetime-guarded section.  In general, this
>>   *         is only needed for handling filters shared across tasks.
>>   * @prev: points to a previously installed, or inherited, filter
>> - * @len: the number of instructions in the program
>> - * @insnsi: the BPF program instructions to evaluate
>> + * @prog: the BPF program to evaluate
>> + * @removable: if this filter is removable with seccomp_pop_filter()
>> + * @extern_prog: if @prog should not be freed in seccomp_free_filter()
>>   *
>>   * seccomp_filter objects are organized in a tree linked via the @prev
>>   * pointer.  For any task, it appears to be a singly-linked list starting
>> @@ -58,6 +59,8 @@ struct seccomp_filter {
>>         atomic_t usage;
>>         struct seccomp_filter *prev;
>>         struct bpf_prog *prog;
>> +       bool removable;
>> +       bool extern_prog;
>>  };
>>   /* Limit any path through the tree to 256KB worth of instructions. */
>> @@ -470,7 +473,8 @@ void get_seccomp_filter(struct task_struct *tsk)
>>  static inline void seccomp_filter_free(struct seccomp_filter *filter)
>>  {
>>         if (filter) {
>> -               bpf_prog_destroy(filter->prog);
>> +               if (!filter->extern_prog)
>> +                       bpf_prog_destroy(filter->prog);
>>                 kfree(filter);
>>         }
>>  }
>> @@ -722,6 +726,7 @@ long prctl_get_seccomp(void)
>>   * seccomp_set_mode_strict: internal function for setting strict seccomp
>>   *
>>   * Once current->seccomp.mode is non-zero, it may not be changed.
>> + * (other than to reset to DISABLED after removing the last removable
>> filter).
>>   *
>>   * Returns 0 on success or -EINVAL on failure.
>>   */
>> @@ -749,33 +754,34 @@ out:
>>   #ifdef CONFIG_SECCOMP_FILTER
>>  /**
>> - * seccomp_set_mode_filter: internal function for setting seccomp filter
>> + * do_push_filter: internal function for setting seccomp filter
>>   * @flags:  flags to change filter behavior
>> - * @filter: struct sock_fprog containing filter
>> + * @prepared: struct seccomp_filter to install
>>   *
>>   * This function may be called repeatedly to install additional filters.
>>   * Every filter successfully installed will be evaluated (in reverse order)
>>   * for each system call the task makes.
>>   *
>> - * Once current->seccomp.mode is non-zero, it may not be changed.
>> + * Once current->seccomp.mode is non-zero, it may not be changed
>> + * (other than to reset to DISABLED after removing the last removable
>> filter).
>>   *
>>   * Returns 0 on success or -EINVAL on failure.
>>   */
>> -static long seccomp_set_mode_filter(unsigned int flags,
>> -                                   const char __user *filter)
>> +long do_push_filter(unsigned int flags, struct seccomp_filter *prepared)
>>  {
>>         const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
>> -       struct seccomp_filter *prepared = NULL;
>>         long ret = -EINVAL;
>>         /* Validate flags. */
>>         if (flags & ~SECCOMP_FILTER_FLAG_MASK)
>>                 return -EINVAL;
>>  -      /* Prepare the new filter before holding any locks. */
>> -       prepared = seccomp_prepare_user_filter(filter);
>> -       if (IS_ERR(prepared))
>> -               return PTR_ERR(prepared);
>> +       if (flags & SECCOMP_FILTER_FLAG_REMOVABLE) {
>> +               /* The intended use case is for thread-local push/pop. */
>> +               if (flags & SECCOMP_FILTER_FLAG_TSYNC)
>> +                       goto out_free;
>> +               prepared->removable = true;
>> +       }
>>         /*
>>          * Make sure we cannot change seccomp or nnp state via TSYNC
>> @@ -805,12 +811,87 @@ out_free:
>>         seccomp_filter_free(prepared);
>>         return ret;
>>  }
>> +
>> +static long seccomp_set_mode_filter(unsigned int flags,
>> +                                   const char __user *filter)
>> +{
>> +       struct seccomp_filter *prepared;
>> +
>> +       /* Prepare the new filter before holding any locks. */
>> +       prepared = seccomp_prepare_user_filter(filter);
>> +       if (IS_ERR(prepared))
>> +               return PTR_ERR(prepared);
>> +       return seccomp_push_filter(flags, prepared);
>> +}
>> +
>> +long seccomp_push_filter(unsigned int flags, struct bpf_prog *fp)
>> +{
>> +       struct seccomp_filter *sfilter;
>> +
>> +       sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL);
>> +       if (!sfilter)
>> +               return ERR_PTR(-ENOMEM);
>> +
>> +       sfilter->prog = fp;
>> +       sfilter->extern_prog = true;
>> +       atomic_set(&sfilter->usage, 1);
>> +
>> +       return do_push_filter(flags, sfilter);
>> +}
>> +
>> +/**
>> + * seccomp_pop_filter: internal function for removing filter
>> + * @flags:  flags to change pop behavior
>> + *
>> + * This function removes the most recently installed filter, if it was
>> + * installed with the SECCOMP_FILTER_FLAG_REMOVABLE flag.  Any previously
>> + * installed filters are left intact.
>> + *
>> + * If the last filter is removed, the seccomp state reverts to DISABLED.
>> + *
>> + * Returns 0 on success or -EINVAL on failure.
>> + */
>> +long seccomp_pop_filter(unsigned int flags)
>> +{
>> +       struct seccomp_filter *filter;
>> +
>> +       /* The intended use case is for temporary thread-local push/pop. */
>> +       if (flags & SECCOMP_FILTER_FLAG_TSYNC)
>> +               return -EINVAL;
>> +
>> +       spin_lock_irq(&current->sighand->siglock);
>> +
>> +       if (current->seccomp.mode != SECCOMP_MODE_FILTER)
>> +               goto out;
>> +
>> +       filter = current->seccomp.filter;
>> +       if (unlikely(WARN_ON(filter == NULL)) || !filter->removable)
>> +               goto out;
>> +
>> +       if (filter->prev == NULL) {
>> +               clear_tsk_thread_flag(current, TIF_SECCOMP);
>> +               current->seccomp.mode = SECCOMP_MODE_DISABLED;
>> +       }
>> +
>> +       current->seccomp.filter = filter->prev;
>> +
>> +       spin_unlock_irq(&current->sighand->siglock);
>> +       seccomp_filter_free(filter);
>> +       return 0;
>> +out:
>> +       spin_unlock_irq(&current->sighand->siglock);
>> +       return -EINVAL;
>> +}
>>  #else
>>  static inline long seccomp_set_mode_filter(unsigned int flags,
>>                                            const char __user *filter)
>>  {
>>         return -EINVAL;
>>  }
>> +static inline long seccomp_pop_filter(unsigned int flags)
>> +{
>> +       return -EINVAL;
>> +}
>>  #endif
>>   /* Common entry point for both prctl and syscall. */
>> @@ -824,6 +905,8 @@ static long do_seccomp(unsigned int op, unsigned int
>> flags,
>>                 return seccomp_set_mode_strict();
>>         case SECCOMP_SET_MODE_FILTER:
>>                 return seccomp_set_mode_filter(flags, uargs);
>> +       case SECCOMP_POP_FILTER:
>> +               return seccomp_pop_filter(flags);
>>         default:
>>                 return -EINVAL;
>>
>>         }
>>
>> --
>> Chris Metcalf, Mellanox Technologies
>> http://www.mellanox.com
>>
>
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC



-- 
Kees Cook
Chrome OS & Brillo Security

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ