linux-kernel - Re: For review: seccomp_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201001171206.jvkdx4htqux5agdv@gmail.com>
Date:   Thu, 1 Oct 2020 19:12:06 +0200
From:   Christian Brauner <christian.brauner@...onical.com>
To:     Tycho Andersen <tycho@...ho.pizza>
Cc:     Jann Horn <jannh@...gle.com>,
        linux-man <linux-man@...r.kernel.org>,
        Song Liu <songliubraving@...com>,
        Will Drewry <wad@...omium.org>,
        Kees Cook <keescook@...omium.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Giuseppe Scrivano <gscrivan@...hat.com>,
        Robert Sesek <rsesek@...gle.com>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>,
        bpf <bpf@...r.kernel.org>, Andy Lutomirski <luto@...capital.net>,
        Christian Brauner <christian@...uner.io>
Subject: Re: For review: seccomp_user_notif(2) manual page

On Thu, Oct 01, 2020 at 10:58:50AM -0600, Tycho Andersen wrote:
> On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn via Containers wrote:
> > On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner
> > <christian.brauner@...onical.com> wrote:
> > > On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote:
> > > > On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages)
> > > > <mtk.manpages@...il.com> wrote:
> > > > > NOTES
> > > > >        The file descriptor returned when seccomp(2) is employed with the
> > > > >        SECCOMP_FILTER_FLAG_NEW_LISTENER  flag  can  be  monitored  using
> > > > >        poll(2), epoll(7), and select(2).  When a notification  is  pend‐
> > > > >        ing,  these interfaces indicate that the file descriptor is read‐
> > > > >        able.
> > > >
> > > > We should probably also point out somewhere that, as
> > > > include/uapi/linux/seccomp.h says:
> > > >
> > > >  * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF
> > > >  * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the
> > > >  * same syscall, the most recently added filter takes precedence. This means
> > > >  * that the new SECCOMP_RET_USER_NOTIF filter can override any
> > > >  * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all
> > > >  * such filtered syscalls to be executed by sending the response
> > > >  * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally
> > > >  * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE.
> > > >
> > > > In other words, from a security perspective, you must assume that the
> > > > target process can bypass any SECCOMP_RET_USER_NOTIF (or
> > > > SECCOMP_RET_TRACE) filters unless it is completely prohibited from
> > > > calling seccomp(). This should also be noted over in the main
> > > > seccomp(2) manpage, especially the SECCOMP_RET_TRACE part.
> > >
> > > So I was actually wondering about this when I skimmed this and a while
> > > ago but forgot about this again... Afaict, you can only ever load a
> > > single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there
> > > already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property
> > > in the tasks filter hierarchy then the kernel will refuse to load a new
> > > one?
> > >
> > > static struct file *init_listener(struct seccomp_filter *filter)
> > > {
> > >         struct file *ret = ERR_PTR(-EBUSY);
> > >         struct seccomp_filter *cur;
> > >
> > >         for (cur = current->seccomp.filter; cur; cur = cur->prev) {
> > >                 if (cur->notif)
> > >                         goto out;
> > >         }
> > >
> > > shouldn't that be sufficient to guarantee that USER_NOTIF filters can't
> > > override each other for the same task simply because there can only ever
> > > be a single one?
> > 
> > Good point. Exceeeept that that check seems ineffective because this
> > happens before we take the locks that guard against TSYNC, and also
> > before we decide to which existing filter we want to chain the new
> > filter. So if two threads race with TSYNC, I think they'll be able to
> > chain two filters with listeners together.
> 
> Yep, seems the check needs to also be in seccomp_can_sync_threads() to
> be totally effective,
> 
> > I don't know whether we want to eternalize this "only one listener
> > across all the filters" restriction in the manpage though, or whether
> > the man page should just say that the kernel currently doesn't support
> > it but that security-wise you should assume that it might at some
> > point.
> 
> This requirement originally came from Andy, arguing that the semantics
> of this were/are confusing, which still makes sense to me. Perhaps we
> should do something like the below?

I think we should either keep up this restriction and then cement it in
the manpage or add a flag to indicate that the notifier is
non-overridable.
I don't care about the default too much, i.e. whether it's overridable
by default and exclusive if opting in or the other way around doesn't
matter too much. But from a supervisor's perspective it'd be quite nice
to be able to be sure that a notifier can't be overriden by another
notifier.

I think having a flag would provide the greatest flexibility but I agree
that the semantics of multiple listeners are kinda odd.

Below looks sane to me though again, I'm not sitting in fron of source
code.

Christian

> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 3ee59ce0a323..7b107207c2b0 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -376,6 +376,18 @@ static int is_ancestor(struct seccomp_filter *parent,
>  	return 0;
>  }
>  
> +static bool has_listener_parent(struct seccomp_filter *child)
> +{
> +	struct seccomp_filter *cur;
> +
> +	for (cur = current->seccomp.filter; cur; cur = cur->prev) {
> +		if (cur->notif)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>  /**
>   * seccomp_can_sync_threads: checks if all threads can be synchronized
>   *
> @@ -385,7 +397,7 @@ static int is_ancestor(struct seccomp_filter *parent,
>   * either not in the correct seccomp mode or did not have an ancestral
>   * seccomp filter.
>   */
> -static inline pid_t seccomp_can_sync_threads(void)
> +static inline pid_t seccomp_can_sync_threads(unsigned int flags)
>  {
>  	struct task_struct *thread, *caller;
>  
> @@ -407,6 +419,11 @@ static inline pid_t seccomp_can_sync_threads(void)
>  				 caller->seccomp.filter)))
>  			continue;
>  
> +		/* don't allow TSYNC to install multiple listeners */
> +		if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER &&
> +		    !has_listener_parent(thread->seccomp.filter))
> +			continue;
> +
>  		/* Return the first thread that cannot be synchronized. */
>  		failed = task_pid_vnr(thread);
>  		/* If the pid cannot be resolved, then return -ESRCH */
> @@ -637,7 +654,7 @@ static long seccomp_attach_filter(unsigned int flags,
>  	if (flags & SECCOMP_FILTER_FLAG_TSYNC) {
>  		int ret;
>  
> -		ret = seccomp_can_sync_threads();
> +		ret = seccomp_can_sync_threads(flags);
>  		if (ret) {
>  			if (flags & SECCOMP_FILTER_FLAG_TSYNC_ESRCH)
>  				return -ESRCH;
> @@ -1462,12 +1479,9 @@ static const struct file_operations seccomp_notify_ops = {
>  static struct file *init_listener(struct seccomp_filter *filter)
>  {
>  	struct file *ret = ERR_PTR(-EBUSY);
> -	struct seccomp_filter *cur;
>  
> -	for (cur = current->seccomp.filter; cur; cur = cur->prev) {
> -		if (cur->notif)
> -			goto out;
> -	}
> +	if (has_listener_parent(current->seccomp.filter))
> +		goto out;
>  
>  	ret = ERR_PTR(-ENOMEM);
>  	filter->notif = kzalloc(sizeof(*(filter->notif)), GFP_KERNEL);