linux-kernel - Re: [PATCH v6 4/5] seccomp: add support for passing fds via USER

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180920234240.GR4672@cisco>
Date:   Thu, 20 Sep 2018 17:42:40 -0600
From:   Tycho Andersen <tycho@...ho.ws>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Kees Cook <keescook@...omium.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        Linux API <linux-api@...r.kernel.org>,
        Oleg Nesterov <oleg@...hat.com>,
        "Eric W . Biederman" <ebiederm@...ssion.com>,
        "Serge E . Hallyn" <serge@...lyn.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Tyler Hicks <tyhicks@...onical.com>,
        Akihiro Suda <suda.akihiro@....ntt.co.jp>,
        Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH v6 4/5] seccomp: add support for passing fds via
 USER_NOTIF

On Wed, Sep 19, 2018 at 12:58:20PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 19, 2018 at 7:38 AM, Tycho Andersen <tycho@...ho.ws> wrote:
> > On Wed, Sep 19, 2018 at 07:19:56AM -0700, Andy Lutomirski wrote:
> >>
> >>
> >> > On Sep 19, 2018, at 2:55 AM, Tycho Andersen <tycho@...ho.ws> wrote:
> >> >
> >> >> On Wed, Sep 12, 2018 at 04:52:38PM -0700, Andy Lutomirski wrote:
> >> >>> On Thu, Sep 6, 2018 at 8:28 AM, Tycho Andersen <tycho@...ho.ws> wrote:
> >> >>> The idea here is that the userspace handler should be able to pass an fd
> >> >>> back to the trapped task, for example so it can be returned from socket().
> >> >>>
> >> >>> I've proposed one API here, but I'm open to other options. In particular,
> >> >>> this only lets you return an fd from a syscall, which may not be enough in
> >> >>> all cases. For example, if an fd is written to an output parameter instead
> >> >>> of returned, the current API can't handle this. Another case is that
> >> >>> netlink takes as input fds sometimes (IFLA_NET_NS_FD, e.g.). If netlink
> >> >>> ever decides to install an fd and output it, we wouldn't be able to handle
> >> >>> this either.
> >> >>
> >> >> An alternative could be to have an API (an ioctl on the listener,
> >> >> perhaps) that just copies an fd into the tracee.  There would be the
> >> >> obvious set of options: do we replace an existing fd or allocate a new
> >> >> one, and is it CLOEXEC.  Then the tracer could add an fd and then
> >> >> return it just like it's a regular number.
> >> >>
> >> >> I feel like this would be more flexible and conceptually simpler, but
> >> >> maybe a little slower for the common cases.  What do you think?
> >> >
> >> > I'm just implementing this now, and there's one question: when do we
> >> > actually do the fd install? Should we do it when the user calls
> >> > SECCOMP_NOTIF_PUT_FD, or when the actual response is sent? It feels
> >> > like we should do it when the response is sent, instead of doing it
> >> > right when SECCOMP_NOTIF_PUT_FD is called, since if there's a
> >> > subsequent signal and the tracer decides to discard the response,
> >> > we'll have to implement some delete mechanism to delete the fd, but it
> >> > would have already been visible to the process, etc. So I'll go
> >> > forward with this unless there are strong objections, but I thought
> >> > I'd point it out just to avoid another round trip.
> >> >
> >> >
> >>
> >> Can you do that non-racily?  That is, you need to commit to an fd *number* right away, but what if another thread uses the number before you actually install the fd?
> >
> > I was thinking we could just do an __alloc_fd() and then do the
> > fd_install() when the response is sent or clean up the case that the
> > listener or task dies. I haven't actually tried to run the code yet,
> > so it's possible the locking won't work :)
> 
> I would be very surprised if the locking works.  How can you run a
> thread in a process when another thread has allocated but not
> installed an fd and is blocked for an arbitrarily long time?

I think the trick is that there's no actual locking required (except
for a brief locking of task->files). I've run the patch below and it
seems to work. But perhaps that's abusing __alloc_fd a little too
hard, I don't really know.

> >
> >> Do we really allow non-“kill” signals to interrupt the whole process?  It might be the case that we don’t really need to clean up from signals if there’s a guarantee that the thread dies.
> >
> > Yes, we do, because of this: https://lkml.org/lkml/2018/3/15/1122
> >
> 
> I'm still not sure I see the problem.  Suppose I'm implementing a user
> notifier for a nasty syscall like recvmsg().  If I'm the tracer, by
> the time I decide to install an fd, I've committed to returning
> something other than -EINTR, even if a non-fatal signal is sent before
> I finish.  No rollback should be necessary.

I don't understand why this is true. Surely you could stop a handler
on receipt of a new signal, and have it do something else entirely?

> In the (unlikely?) event that some tracer needs to be able to rollback
> an fd installation to return -EINTR, a SECCOMP_NOTIF_CLOSE_FD
> operation should be good enough, I think.  Or maybe PUT_FD can put -1
> to delete an fd.

Yes, I think even with something like what I did below we'd need some
sort of REMOVE_FD option, because otherwise there's no way to change
your mind and send -EINTR without the fd you just PUT_FD'd.

Tycho


>From bfca7337cb53791aca74b595eb45e9afa3babac2 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho@...ho.ws>
Date: Thu, 20 Sep 2018 06:49:49 -0600
Subject: [PATCH] implement SECCOMP_NOTIF_PUT_FD ioctl

Signed-off-by: Tycho Andersen <tycho@...ho.ws>
---
 include/uapi/linux/seccomp.h                  |  8 ++
 kernel/seccomp.c                              | 74 ++++++++++++-------
 tools/testing/selftests/seccomp/seccomp_bpf.c | 24 +++++-
 3 files changed, 77 insertions(+), 29 deletions(-)

diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 8fb2c024c0a1..62e474c372d4 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -80,6 +80,12 @@ struct seccomp_notif_resp {
 	__u32 fd_flags;
 };
 
+struct seccomp_notif_put_fd {
+	__u64 id;
+	__s32 fd;
+	__u32 fd_flags;
+};
+
 #define SECCOMP_IOC_MAGIC		0xF7
 
 /* Flags for seccomp notification fd ioctl. */
@@ -89,5 +95,7 @@ struct seccomp_notif_resp {
 					struct seccomp_notif_resp)
 #define SECCOMP_NOTIF_ID_VALID	_IOR(SECCOMP_IOC_MAGIC, 2,	\
 					__u64)
+#define SECCOMP_NOTIF_PUT_FD	_IOR(SECCOMP_IOC_MAGIC, 3,	\
+					struct seccomp_notif_put_fd)
 
 #endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 21b24cc07237..6bdf413863ca 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -41,6 +41,7 @@
 #include <linux/tracehook.h>
 #include <linux/uaccess.h>
 #include <linux/anon_inodes.h>
+#include <linux/fdtable.h>
 #include <net/cls_cgroup.h>
 
 enum notify_state {
@@ -51,6 +52,7 @@ enum notify_state {
 
 struct seccomp_knotif {
 	/* The struct pid of the task whose filter triggered the notification */
+	struct task_struct *task;
 	struct pid *pid;
 
 	/* The "cookie" for this request; this is unique for this filter. */
@@ -80,7 +82,7 @@ struct seccomp_knotif {
 	int error;
 	long val;
 	struct file *file;
-	unsigned int flags;
+	int fd;
 
 	/* Signals when this has entered SECCOMP_NOTIFY_REPLIED */
 	struct completion ready;
@@ -748,9 +750,11 @@ static void seccomp_do_user_notification(int this_syscall,
 	if (!match->notif)
 		goto out;
 
+	n.task = current;
 	n.pid = task_pid(current);
 	n.state = SECCOMP_NOTIFY_INIT;
 	n.data = sd;
+	n.fd = -1;
 	n.id = seccomp_next_notify_id(match);
 	init_completion(&n.ready);
 
@@ -786,16 +790,8 @@ static void seccomp_do_user_notification(int this_syscall,
 	}
 
 	if (n.file) {
-		int fd;
 		struct socket *sock;
 
-		fd = get_unused_fd_flags(n.flags);
-		if (fd < 0) {
-			err = fd;
-			ret = -1;
-			goto remove_list;
-		}
-
 		/*
 		 * Similar to what SCM_RIGHTS does, let's re-set the cgroup
 		 * data to point ot the tracee's cgroups instead of the
@@ -807,21 +803,20 @@ static void seccomp_do_user_notification(int this_syscall,
 			sock_update_classid(&sock->sk->sk_cgrp_data);
 		}
 
-		ret = fd;
-		err = 0;
-
-		fd_install(fd, n.file);
+		fd_install(n.fd, n.file);
 		/* Don't fput, since fd has a reference now */
 		n.file = NULL;
-	} else {
-		ret = n.val;
-		err = n.error;
+		n.fd = -1;
 	}
 
+	ret = n.val;
+	err = n.error;
 
 remove_list:
 	if (n.file)
 		fput(n.file);
+	if (n.fd >= 0)
+		put_unused_fd(n.fd);
 
 	list_del(&n.list);
 out:
@@ -1683,15 +1678,6 @@ static long seccomp_notify_send(struct seccomp_filter *filter,
 		goto out;
 	}
 
-	if (resp.return_fd) {
-		knotif->flags = resp.fd_flags;
-		knotif->file = fget(resp.fd);
-		if (!knotif->file) {
-			ret = -EBADF;
-			goto out;
-		}
-	}
-
 	ret = size;
 	knotif->state = SECCOMP_NOTIFY_REPLIED;
 	knotif->error = resp.error;
@@ -1731,6 +1717,42 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter,
 	return ret;
 }
 
+static long seccomp_notify_put_fd(struct seccomp_filter *filter,
+				  unsigned long arg)
+{
+	struct seccomp_notif_put_fd req;
+	void __user *buf = (void __user *)arg;
+	struct seccomp_knotif *knotif = NULL;
+	long ret;
+
+	if (copy_from_user(&req, buf, sizeof(req)))
+		return -EFAULT;
+
+	ret = mutex_lock_interruptible(&filter->notify_lock);
+	if (ret < 0)
+		return ret;
+
+	ret = -ENOENT;
+	list_for_each_entry(knotif, &filter->notif->notifications, list) {
+		unsigned long max_files;
+
+		if (knotif->id != req.id)
+			continue;
+
+		max_files = task_rlimit(knotif->task, RLIMIT_NOFILE);
+
+		knotif->file = fget(req.fd);
+		knotif->fd = __alloc_fd(knotif->task->files, 0, max_files,
+					req.fd_flags);
+
+		ret = knotif->fd;
+		break;
+	}
+
+	mutex_unlock(&filter->notify_lock);
+	return ret;
+}
+
 static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
 				 unsigned long arg)
 {
@@ -1743,6 +1765,8 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
 		return seccomp_notify_send(filter, arg);
 	case SECCOMP_NOTIF_ID_VALID:
 		return seccomp_notify_id_valid(filter, arg);
+	case SECCOMP_NOTIF_PUT_FD:
+		return seccomp_notify_put_fd(filter, arg);
 	default:
 		return -EINVAL;
 	}
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 3dec856717a7..ae8daf992231 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -169,6 +169,9 @@ struct seccomp_metadata {
 					struct seccomp_notif_resp)
 #define SECCOMP_NOTIF_ID_VALID	_IOR(SECCOMP_IOC_MAGIC, 2,	\
 					__u64)
+#define SECCOMP_NOTIF_PUT_FD	_IOR(SECCOMP_IOC_MAGIC, 3,	\
+					struct seccomp_notif_put_fd)
+
 struct seccomp_notif {
 	__u16 len;
 	__u64 id;
@@ -186,6 +189,12 @@ struct seccomp_notif_resp {
 	__u32 fd;
 	__u32 fd_flags;
 };
+
+struct seccomp_notif_put_fd {
+	__u64 id;
+	__s32 fd;
+	__u32 fd_flags;
+};
 #endif
 
 #ifndef seccomp
@@ -3239,11 +3248,12 @@ TEST(get_user_notification_ptrace)
 TEST(user_notification_pass_fd)
 {
 	pid_t pid;
-	int status, listener;
+	int status, listener, fd;
 	int sk_pair[2];
 	char c;
 	struct seccomp_notif req = {};
 	struct seccomp_notif_resp resp = {};
+	struct seccomp_notif_put_fd putfd = {};
 	long ret;
 
 	ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0);
@@ -3295,9 +3305,15 @@ TEST(user_notification_pass_fd)
 
 	resp.len = sizeof(resp);
 	resp.id = req.id;
-	resp.return_fd = 1;
-	resp.fd = sk_pair[1];
-	resp.fd_flags = 0;
+
+	putfd.id = req.id;
+	putfd.fd = sk_pair[1];
+	putfd.fd_flags = 0;
+
+	fd = ioctl(listener, SECCOMP_NOTIF_PUT_FD, &putfd);
+	EXPECT_GE(fd, 0);
+	resp.val = fd;
+
 	EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_SEND, &resp), sizeof(resp));
 	close(sk_pair[1]);
 
-- 
2.17.1