linux-kernel - Re: [PATCH v3 4/4] seccomp: add support for passing fds via USER

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 3 Jun 2018 18:14:59 -0600
From:   Tycho Andersen <tycho@...ho.ws>
To:     Alban Crequy <alban.crequy@...il.com>
Cc:     linux-kernel@...r.kernel.org,
        Linux Containers <containers@...ts.linux-foundation.org>,
        Kees Cook <keescook@...omium.org>,
        Andy Lutomirski <luto@...capital.net>,
        Oleg Nesterov <oleg@...hat.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        "Serge E. Hallyn" <serge@...lyn.com>, christian.brauner@...ntu.com,
        tyhicks@...onical.com, Akihiro Suda <suda.akihiro@....ntt.co.jp>,
        me@...in.cc
Subject: Re: [PATCH v3 4/4] seccomp: add support for passing fds via
 USER_NOTIF

Hi Alban,

On Sat, Jun 02, 2018 at 09:14:09PM +0200, Alban Crequy wrote:
> On Thu, 31 May 2018 at 16:52, Tycho Andersen <tycho@...ho.ws> wrote:
> >
> > The idea here is that the userspace handler should be able to pass an fd
> > back to the trapped task, for example so it can be returned from socket().
> >
> > I've proposed one API here, but I'm open to other options. In particular,
> > this only lets you return an fd from a syscall, which may not be enough in
> > all cases. For example, if an fd is written to an output parameter instead
> > of returned, the current API can't handle this. Another case is that
> > netlink takes as input fds sometimes (IFLA_NET_NS_FD, e.g.). If netlink
> > ever decides to install an fd and output it, we wouldn't be able to handle
> > this either.
> >
> > Still, the vast majority of interesting cases are covered by this API, so
> > perhaps it is Enough.
> >
> > I've left it as a separate commit for two reasons:
> >   * It illustrates the way in which we would grow struct seccomp_notif and
> >     struct seccomp_notif_resp without using netlink
> >   * It shows just how little code is needed to accomplish this :)
> >
> > v2: new in v2
> > v3: no changes
> >
> > Signed-off-by: Tycho Andersen <tycho@...ho.ws>
> > CC: Kees Cook <keescook@...omium.org>
> > CC: Andy Lutomirski <luto@...capital.net>
> > CC: Oleg Nesterov <oleg@...hat.com>
> > CC: Eric W. Biederman <ebiederm@...ssion.com>
> > CC: "Serge E. Hallyn" <serge@...lyn.com>
> > CC: Christian Brauner <christian.brauner@...ntu.com>
> > CC: Tyler Hicks <tyhicks@...onical.com>
> > CC: Akihiro Suda <suda.akihiro@....ntt.co.jp>
> > ---
> >  include/uapi/linux/seccomp.h                  |   2 +
> >  kernel/seccomp.c                              |  49 +++++++-
> >  tools/testing/selftests/seccomp/seccomp_bpf.c | 112 ++++++++++++++++++
> >  3 files changed, 161 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> > index 8160e6cad528..3124427219cb 100644
> > --- a/include/uapi/linux/seccomp.h
> > +++ b/include/uapi/linux/seccomp.h
> > @@ -71,6 +71,8 @@ struct seccomp_notif_resp {
> >         __u64 id;
> >         __s32 error;
> >         __s64 val;
> > +       __u8 return_fd;
> > +       __u32 fd;
> >  };
> >
> >  #endif /* _UAPI_LINUX_SECCOMP_H */
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index 6dc99c65c2f4..2ee958b3efde 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -77,6 +77,8 @@ struct seccomp_knotif {
> >         /* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */
> >         int error;
> >         long val;
> > +       struct file *file;
> > +       unsigned int flags;
> >
> >         /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */
> >         struct completion ready;
> > @@ -780,10 +782,32 @@ static void seccomp_do_user_notification(int this_syscall,
> >                         goto remove_list;
> >         }
> >
> > -       ret = n.val;
> > -       err = n.error;
> > +       if (n.file) {
> > +               int fd;
> > +
> > +               fd = get_unused_fd_flags(n.flags);
> > +               if (fd < 0) {
> > +                       err = fd;
> > +                       ret = -1;
> > +                       goto remove_list;
> > +               }
> > +
> > +               ret = fd;
> > +               err = 0;
> > +
> > +               fd_install(fd, n.file);
> > +               /* Don't fput, since fd has a reference now */
> > +               n.file = NULL;
> 
> Do we want the cgroup classid and netprio to be applied here, before
> the fd_install? I am looking at similar code in net/core/scm.c
> scm_detach_fds():
>                 sock = sock_from_file(fp[i], &err);
>                 if (sock) {
>                         sock_update_netprioidx(&sock->sk->sk_cgrp_data);
>                         sock_update_classid(&sock->sk->sk_cgrp_data);
>                 }
> 
> The listener process might live in a different cgroup with a different
> classid & netprio, so it might not be applied as the app might expect.

Thanks, I hadn't really thought about this. I think doing what
SCM_RIGHTS does makes sense -- the operation is essentially the same.

> Also, should we update the struct sock_cgroup_data of the socket, in
> order to make the BPF helper function bpf_skb_under_cgroup() work wrt
> the cgroup of the target process instead of the listener process? I am
> looking at cgroup_sk_alloc(). I don't know what's the correct
> behaviour we want here.

SCM_RIGHTS seems to omit this (I assume you mean the val field of
struct sock_cgroup_data, which seems to be a pointer to struct
cgroup*), any idea why?

> > +       } else {
> > +               ret = n.val;
> > +               err = n.error;
> > +       }
> > +
> >
> >  remove_list:
> > +       if (n.file)
> > +               fput(n.file);
> > +
> >         list_del(&n.list);
> >  out:
> >         mutex_unlock(&match->notify_lock);
> > @@ -1598,6 +1622,27 @@ static ssize_t seccomp_notify_write(struct file *file, const char __user *buf,
> >         knotif->state = SECCOMP_NOTIFY_REPLIED;
> >         knotif->error = resp.error;
> >         knotif->val = resp.val;
> > +
> > +       if (resp.return_fd) {
> > +               struct fd fd;
> > +
> > +               /*
> > +                * This is a little hokey: we need a real fget() (i.e. not
> > +                * __fget_light(), which is what fdget does), but we also need
> > +                * the flags from strcut fd. So, we get it, put it, and get it
> > +                * again for real.
> > +                */
> > +               fd = fdget(resp.fd);
> > +               knotif->flags = fd.flags;
> > +               fdput(fd);
> > +
> > +               knotif->file = fget(resp.fd);
> > +               if (!knotif->file) {
> > +                       ret = -EBADF;
> > +                       goto out;
> 
> Should the "knotif->state = SECCOMP_NOTIFY_REPLIED" and other changes
> be done after the error case here? In case of bad fd, it seems to
> return -EBADF the first time and -EINVAL the next time because the
> state would have been changed to SECCOMP_NOTIFY_REPLIED already.

Yes, good catch, thanks!

Tycho