linux-kernel - Re: [PATCH v3 1/4] seccomp: add a return code to trap to userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180614210325.GA5673@cisco>
Date:   Thu, 14 Jun 2018 14:03:25 -0700
From:   Tycho Andersen <tycho@...ho.ws>
To:     Matthew Helsley <matt.helsley@...il.com>
Cc:     lkml <linux-kernel@...r.kernel.org>,
        containers@...ts.linux-foundation.org,
        "Tobin C . Harding" <me@...in.cc>,
        Kees Cook <keescook@...omium.org>,
        Akihiro Suda <suda.akihiro@....ntt.co.jp>,
        Oleg Nesterov <oleg@...hat.com>,
        Andy Lutomirski <luto@...capital.net>,
        "Eric W . Biederman" <ebiederm@...ssion.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Tyler Hicks <tyhicks@...onical.com>
Subject: Re: [PATCH v3 1/4] seccomp: add a return code to trap to userspace

On Thu, Jun 14, 2018 at 12:44:21PM -0700, Matthew Helsley wrote:
> On Tue, Jun 12, 2018 at 4:16 PM, Tycho Andersen <tycho@...ho.ws> wrote:
> 
> > Hi Matthew,
> >
> > On Tue, Jun 12, 2018 at 02:39:03PM -0700, Matthew Helsley wrote:
> > > On Thu, May 31, 2018 at 7:49 AM, Tycho Andersen <tycho@...ho.ws> wrote:
> > >
> > > <snip>
> > >
> > >
> > > > +struct seccomp_notif {
> > > > +       __u64 id;
> > > > +       pid_t pid;
> > > > +       struct seccomp_data data;
> > > > +};
> > > >
> > >
> > > Since it's part of the UAPI I think it would be good to add documentation
> > > to this patch series. Part of that documentation should talk about which
> > > pid namespaces this pid value is relevant in. This is especially
> > important
> > > since the feature is designed for use by things like container/sandbox
> > > managers.
> >
> > Yes, at least Documentation/userspace-api/seccomp_filter.txt should be
> > updated. I'll add that for the next series.
> >
> 
> Are there some relevant man pages too that should be updated too?

Yes, but since those are in a separate tree, I usually send the sets
separately.

> > > > +static void seccomp_do_user_notification(int this_syscall,
> > > > +                                        struct seccomp_filter *match,
> > > > +                                        const struct seccomp_data *sd)
> > > > +{
> > > > +       int err;
> > > > +       long ret = 0;
> > > > +       struct seccomp_knotif n = {};
> > > > +
> > > > +       mutex_lock(&match->notify_lock);
> > > > +       err = -ENOSYS;
> > > > +       if (!match->has_listener)
> > > > +               goto out;
> > > > +
> > > > +       n.pid = current->pid;
> > > >
> > >
> > > How have you tested this code for correctness? I don't see many
> > > combinations being tested below nor here:
> > > https://github.com/tych0/kernel-utils/blob/master/
> > seccomp/notify_stress.c
> > >
> > > What about processes in different pid namespaces? Make tests that vary
> > key
> > > parameters like these between the task generating the notifications and
> > the
> > > task interested in processing the notifications. Make tests that try to
> > > kill them at interesting times too. etc. Make tests that pass the
> > > notification fd around and see how they work (or not).
> > >
> > > I ask about testing because you're effectively returning a pid value to
> > > userspace here but not using the proper macros to access the task's
> > struct
> > > pid for that purpose. That will work so long as both processes are in the
> > > same pid namespace but breaks otherwise.
> > >
> > > Furthermore, a pid value is not the best solution for the queueing of
> > these
> > > notifications because a single pid value is not meaningful/correct
> > outside
> > > current's pid namespace and the seccomp notification file descriptor
> > could
> > > be passed on to processes in another pid namespaces; this pid value will
> > > almost always not be relevant or correct there hence taking a reference
> > to
> >
> > Well, it *has* to be relevant in some cases: if you want to access
> > some of the task's memory to e.g. read the args to the syscall, you
> > need to ptrace or map_files to access the memory. So we do need to
> > pass it, but,
> >
> > > the struct pid is useful. Then, during read(), the code would use the
> > > proper macro to turn the struct pid reference into a pid value relevant
> > in
> > > the *reader's* pid namespace *or* return something obviously bogus if the
> > > reader is in a pid namespace that can't see that pid. This could prevent
> > > management processes from being tricked into clobbering the wrong
> > process,
> > > feeding the wrong process sensitive information via syscall results, etc.
> >
> > Yes, this makes sense. Seems like we need to do some magic about
> > passing the tracee's task struct to the listener, so it can do
> > task_pid_vnr(). We could perhaps require the listener to be in the
> > init_pid_ns case, but I think things like socket() might still be
> > useful even if you can't read the tracee's memory.
> 
> 
> You could take a reference to the pid rather than the task
> then use pid_vnr(). In that case the changes should result in something
> like:
> 
> 
> struct seccomp_knotif {
>        /* The pid whose filter triggered the notification */
>        struct pid *pid;
> ...
> 
> static void seccomp_do_user_notification(...)
> {
>     ...
>     n.pid = get_task_pid(current, PIDTYPE_PID);
>     ...
> remove_list:
>     list_del(&n.list);
>     put_pid(n.pid);
>     ...
> }
> ...
> static ssize_t seccomp_notify_read(...)
> {
>     ...
>     unotif.pid = pid_vnr(knotif->pid);
>     ...
> }
> 
> I like holding the pid reference because it's what we do elsewhere when pid
> namespaces
> are a concern and it more precisely specifies what the knotif content needs
> to convey.
> Otherwise I don't think it makes a difference.

Great, thanks, I'll do this. I guess we need a put_pid() here too.

Cheers,

Tycho