lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220901135802.oeefj2bmsy5gcsmy@wittgenstein>
Date:   Thu, 1 Sep 2022 15:58:02 +0200
From:   Christian Brauner <brauner@...nel.org>
To:     Andrei Vagin <avagin@...il.com>
Cc:     linux-kernel@...r.kernel.org,
        Andy Lutomirski <luto@...capital.net>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Kees Cook <keescook@...omium.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Peter Oskolkov <posk@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Tycho Andersen <tycho@...ho.pizza>,
        Will Drewry <wad@...omium.org>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH 4/4] seccomp: add the synchronous mode for seccomp_unotify

On Tue, Aug 30, 2022 at 02:23:24PM -0700, Andrei Vagin wrote:
> On Tue, Aug 30, 2022 at 3:43 AM Christian Brauner <brauner@...nel.org> wrote:
> >
> > On Mon, Aug 29, 2022 at 06:43:56PM -0700, Andrei Vagin wrote:
> > > seccomp_unotify allows more privileged processes does actions on behalf
> > > of less privileged processes.
> > >
> > > In many cases, the workflow is fully synchronous. It means a target
> > > process triggers a system call and passes controls to a supervisor
> > > process that handles the system call and returns controls to the target
> > > process. In this context, "synchronous" means that only one process is
> > > running and another one is waiting.
> > >
> > > There is the WF_CURRENT_CPU flag that is used to advise the scheduler to
> > > move the wakee to the current CPU. For such synchronous workflows, it
> > > makes context switches a few times faster.
> > >
> > > Right now, each interaction takes 12µs. With this patch, it takes about
> > > 3µs.
> >
> > Seems like a nice idea though I leave it to the sched people to judge
> > whether this is sane or not. So the supervisor which gets woken will be
> > moved to the current cpu in this synchronous scenario.
> >
> > I have no strong opinions on this patch. There are two things I wonder
> > about. First, how meaningful is that speed up given that the supervisor
> > will most often do a lot of heavy-handed things anyway.
> 
> I would not use the "most often" phrase in this case;). It is true for LXC-like
> use cases when we need to handle rare syscalls. In this case, the performance
> of this interface doesn't play a big role. But my use case is very different. I
> have a prototype of the gVisor platform, where seccomp is used to trap
> guest system calls. In this case, the difference between 12µs and 3µs is
> tremendous.

Oh yeah, makes sense. I don't know enough about gVisor but I know we can
trust your word! :)

> 
> The idea of WF_CURRENT_CPU is not mine. I spied it from the umcg series.
> I took the second patch from that series without any changes.
> 
> >
> > Second, this flag is a very specific thing and I wonder how much
> > userspace will really use this and what's more use this correctly.
> >
> > Just to note that LXD - one of the biggest user of this feature - isn't
> > synchronous iiuc for example. Each container gets a separate seccomp
> > supervisor thread (well, go routine but whatever) which exposes a socket
> > that the container manager connects to and sends the seccomp
> > notifications it received from its payload according to an api we
> > established. And each notification is handled in a separate thread
> > (again, go routine but whatever).
> 
> It could be synchronous if seccomp events had been handled in [lxc monitor]. But
> right now, [lxc monitor] is just a proxy. In this case, you are right, lxc will

Yep.

> not get any benefits by setting this flag. But we can look at this from another
> side. If we add these changes, we will have another big user of the interface. I
> think the number of gVisor containers that are started each day is comparable
> with the number of LXC/LXD containers.

Sure, if there's users that would benefit from this then no reason to
not consider it. It's just a lot of low-level knobs we give userspace
here but I guess for the notifier it makes sense.

> 
> >
> > >
> > > This change introduce the SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP flag that
> > > it used to enable the sync mode.
> > >
> > > Signed-off-by: Andrei Vagin <avagin@...il.com>
> > > ---
> > >  include/uapi/linux/seccomp.h |  4 ++++
> > >  kernel/seccomp.c             | 35 +++++++++++++++++++++++++++++++++--
> > >  2 files changed, 37 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> > > index 0fdc6ef02b94..dbfc9b37fcae 100644
> > > --- a/include/uapi/linux/seccomp.h
> > > +++ b/include/uapi/linux/seccomp.h
> > > @@ -115,6 +115,8 @@ struct seccomp_notif_resp {
> > >       __u32 flags;
> > >  };
> > >
> 
> <snip>
> 
> > >
> > >  #ifdef SECCOMP_ARCH_NATIVE
> > > @@ -1117,7 +1120,10 @@ static int seccomp_do_user_notification(int this_syscall,
> > >       INIT_LIST_HEAD(&n.addfd);
> > >
> > >       atomic_add(1, &match->notif->requests);
> > > -     wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
> > > +     if (match->notif->flags & SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP)
> > > +             wake_up_poll_on_current_cpu(&match->wqh, EPOLLIN | EPOLLRDNORM);
> > > +     else
> > > +             wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
> >
> > (We're accumulating a lot of conditional wake primitives in the notifier.)
> >
> 
> I am not sure that I understand what you mean here.

I just meant that we have 

if (wait_killable)
	err = wait_for_completion_killable(&n.ready);
else
	err = wait_for_completion_interruptible(&n.ready);

and now also

if (match->notif->flags & SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP)
        wake_up_poll_on_current_cpu(&match->wqh, EPOLLIN | EPOLLRDNORM);
else
        wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);

which is a bit unpleasant but nothing that would mean we can't do this.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ