linux-kernel - Re: [PATCH v4 1/4] seccomp: add a return code to trap to userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG48ez2-vK57HfbRs94YLiz3SVJ7u4tTrxtAmbRepDqk6m-c_A@mail.gmail.com>
Date:   Fri, 22 Jun 2018 01:21:47 +0200
From:   Jann Horn <jannh@...gle.com>
To:     Tycho Andersen <tycho@...ho.ws>
Cc:     Kees Cook <keescook@...omium.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        containers@...ts.linux-foundation.org,
        Linux API <linux-api@...r.kernel.org>,
        Andy Lutomirski <luto@...capital.net>,
        Oleg Nesterov <oleg@...hat.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        "Serge E. Hallyn" <serge@...lyn.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Tyler Hicks <tyhicks@...onical.com>,
        suda.akihiro@....ntt.co.jp, "Tobin C. Harding" <me@...in.cc>
Subject: Re: [PATCH v4 1/4] seccomp: add a return code to trap to userspace

On Fri, Jun 22, 2018 at 12:05 AM Tycho Andersen <tycho@...ho.ws> wrote:
>
> This patch introduces a means for syscalls matched in seccomp to notify
> some other task that a particular filter has been triggered.
[...]
> +Userspace Notification
> +======================
> +
> +The ``SECCOMP_RET_USER_NOTIF`` return code lets seccomp filters pass a
> +particular syscall to userspace to be handled. This may be useful for
> +applications like container managers, which whish to intercept particular

typo: "wish"

[...]
> +passed around via ``SCM_RIGHTS`` or similar. Alternativley, a filter fd can be

typo: "Alternatively"

[...]
> +It is worth noting that ``struct seccomp_data`` contains the values of register
> +arguments to the syscall, but does not contain pointers to memory. The task's
> +memory is accessiable to suitably privileged traces via via ``ptrace()`` or

Typo: "accessible"

[...]
> +
> +static void seccomp_do_user_notification(int this_syscall,
> +                                        struct seccomp_filter *match,
> +                                        const struct seccomp_data *sd)
> +{
> +       int err;
> +       long ret = 0;
> +       struct seccomp_knotif n = {};
> +
> +       mutex_lock(&match->notify_lock);
> +       err = -ENOSYS;
> +       if (!match->has_listener)
> +               goto out;
> +
> +       n.pid = task_pid(current);
> +       n.state = SECCOMP_NOTIFY_INIT;
> +       n.data = sd;
> +       n.id = seccomp_next_notify_id(match);
> +       init_completion(&n.ready);
> +
> +       list_add(&n.list, &match->notifications);
> +       wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
> +
> +       mutex_unlock(&match->notify_lock);
> +       up(&match->request);
> +
> +       err = wait_for_completion_interruptible(&n.ready);
> +       mutex_lock(&match->notify_lock);
> +
> +       /*
> +        * Here it's possible we got a signal and then had to wait on the mutex
> +        * while the reply was sent, so let's be sure there wasn't a response
> +        * in the meantime.
> +        */
> +       if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) {
> +               /*
> +                * We got a signal. Let's tell userspace about it (potentially
> +                * again, if we had already notified them about the first one).
> +                */
> +               if (n.state == SECCOMP_NOTIFY_SENT) {
> +                       n.state = SECCOMP_NOTIFY_INIT;
> +                       up(&match->request);
> +               }
> +               mutex_unlock(&match->notify_lock);
> +               err = wait_for_completion_killable(&n.ready);

Does this mean that when you get a signal that isn't SIGKILL,
wait_for_completion_interruptible() will bail out with -ERESTARTSYS,
but then you hang on this wait_for_completion_killable()? I don't
understand what's going on here. What's the point of using
wait_for_completion_interruptible() when you'll just hang on another
wait on the same "struct completion"?

> +               mutex_lock(&match->notify_lock);
> +               if (err < 0)
> +                       goto remove_list;
> +       }
> +
> +       ret = n.val;
> +       err = n.error;
> +
> +remove_list:
> +       list_del(&n.list);
> +out:
> +       mutex_unlock(&match->notify_lock);
> +       syscall_set_return_value(current, task_pt_regs(current),
> +                                err, ret);
> +}