[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210301132156.in3z53t5xxy3ity5@wittgenstein>
Date: Mon, 1 Mar 2021 14:21:56 +0100
From: Christian Brauner <christian.brauner@...ntu.com>
To: Sargun Dhillon <sargun@...gun.me>
Cc: Kees Cook <keescook@...omium.org>,
LKML <linux-kernel@...r.kernel.org>,
Giuseppe Scrivano <gscrivan@...hat.com>,
Tycho Andersen <tycho@...ho.pizza>,
Hariharan Ananthakrishnan <hari@...flix.com>,
Keerti Lakshminarayan <keerti@...flix.com>,
Kyle Anderson <kylea@...flix.com>,
Linux Containers List <containers@...ts.linux-foundation.org>,
stgraber@...ntu.com, Andy Lutomirski <luto@...capital.net>
Subject: Re: seccomp: Delay filter activation
On Mon, Mar 01, 2021 at 12:09:09PM +0100, Christian Brauner wrote:
> On Sat, Feb 20, 2021 at 01:31:57AM -0800, Sargun Dhillon wrote:
> > We've run into a problem where attaching a filter can be quite messy
> > business because the filter itself intercepts sendmsg, and other
> > syscalls related to exfiltrating the listener FD. I believe that this
> > problem set has been brought up before, and although there are
> > "simpler" methods of exfiltrating the listener, like clone3 or
> > pidfd_getfd, but these are still less than ideal.
>
> (You really like sending patches and discussion points in the middle of
> the merge window. :D I think everyone's panicked about getting their PRs
> in shape so it's not unlikely that this sometimes gets lost on the list. :))
>
> It hasn't been a huge problem for us, especially since we added
> pidfd_getfd() this seemed like a straightforward problem to solve by
> selecting a fix fd number that is to be used for the listener. But I can
> see why it is annoying.
>
> >
> > One of the ideas that's been talked about (I want to say back at LSS
> > NA) is the idea of "delayed activation". I was thinking that it might
> > be nice to have a mechanism to do delayed attach, either activated on
> > execve / fork, or an ioctl on the listenerfd to activate the filter
> > and have a flag like SECCOMP_FILTER_FLAG_NEW_LISTENER_INACTIVE, which
> > indicates that the listener should be setup, but not enforcing, and
> > another ioctl to activate it.
> >
> > The later approach is preferred due to simplicity, but I can see a
> > situation where you could accidentally get into a state where the
> > filter is not being enforced. Additionally, this may have unforeseen
> > implications with CRIU.
>
> (If you were to expose an ioctl() that allows userspace to query the
> notifer state then CRIU shouldn't have a problem restoring the notifier
> in the correct state. Right now it doesn't do anyting fancy about the
> notifier, it just restores the task with the filter. It just has to
> learn about the new feature and that's fine imho.)
>
> >
> > I'm curious whether this is a problem others share, and whether any of
> > the aforementioned approaches seem reasonable.
>
> So when I originally suggested the delayed activation I I had another
> related idea that I think I might have mentioned too: if we're already
> considering delaying filter activation I like to discuss the possibility
> of attaching a seccomp filter to a task.
>
> Right now, if one task wants to attach to another task they need to
> recreate the whole seccomp filter and load it. That's not just pretty
> expensive but also only works if you have access to the rules that the
> filter was generated with. For container that's usually some sort of
> pseudo seccomp filter configuration language dumped into a config file
> from which it can be read.
>
> So right now the status quo is:
>
> struct sock_filter filter[] = {
> BPF_STMT(BPF_LD|BPF_W|BPF_ABS, offsetof(struct seccomp_data, nr)),
> BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, nr, 0, 1),
> BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF), /* Get me a listener fd */
> BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
> };
> struct sock_fprog prog = {
> .len = (unsigned short)ARRAY_SIZE(filter),
> .filter = filter,
> };
> int fd = seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog);
>
> and then the caller must send the fd to the manager or the manager uses
> pidfd_getfd().
>
> But, why not get a bit crazy^wcreative; especially since seccomp() is
> already a multiplexer. We introduce a new seccomp flag:
>
> #define SECCOMP_FILTER_DETACHED
>
> and a new seccomp command:
>
> #define SECCOMP_ATTACH_FILTER
>
> And now we could do something like:
>
> pid_t pid = fork();
> if (pid < 0)
> return;
>
> if (pid == 0) {
> // do stuff
> BARRIER_WAKE_SETUP_DONE;
>
> // do more unrelated stuff
>
> BARRIER_WAIT_SECCOMP_FILTER;
> execve(exec-something);
> } else {
>
> int fd_filter;
>
> struct sock_filter filter[] = {
> BPF_STMT(BPF_LD|BPF_W|BPF_ABS, offsetof(struct seccomp_data, nr)),
> BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, nr, 0, 1),
> BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
> };
>
> struct sock_fprog prog = {
> .len = (unsigned short)ARRAY_SIZE(filter),
> .filter = filter,
> };
>
> int fd_filter = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_DETACHED, &prog);
>
> BARRIER_WAIT_SETUP_DONE;
>
> int ret = seccomp(SECCOMP_ATTACH_FILTER, 0, INT_TO_PTR(fd_listener));
This obviously should've been sm like:
struct seccomp_filter_attach {
union {
__s32 pidfd;
__s32 pid;
};
__u32 fd_filter;
};
and then
int ret = seccomp(SECCOMP_ATTACH_FILTER, 0, seccomp_filter_attach);
>
> BARRIER_WAKE_SECCOMP_FILTER;
> }
>
> And now you have attached a filter to another task. This would be super
> elegant for a container manager. The container manager could also stash
> the filter fd and when attaching to a container the manager can send the
> attaching task the fd and the attaching task can do:
>
> int ret = seccomp(SECCOMP_ATTACH_FILTER, 0, INT_TO_PTR(fd_filter));
>
> too and would be attached to the same filter as the target task.
>
> And for the listener fd case a container manager could simply set
> SECCOMP_RET_USER_NOTIF as before
>
> struct sock_filter filter[] = {
> BPF_STMT(BPF_LD|BPF_W|BPF_ABS, offsetof(struct seccomp_data, nr)),
> BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, nr, 0, 1),
> BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF),
> BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
> };
>
> and now fd_filter simply functions as the notifier fd after
> seccomp(SECCOMP_ATTACH_FILTER) that's basically the fancy version of my
> delayed notifier activiation idea.
>
> I'm sure there's nastiness to figure out but I would love to see
> something like this.
>
> Christian
Powered by blists - more mailing lists