[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMp4zn_j+g8gAEPNRo0hD4ioc9a5hEPD=6gcbh8fXs2XxN95mQ@mail.gmail.com>
Date: Wed, 28 Apr 2021 10:13:06 -0700
From: Sargun Dhillon <sargun@...gun.me>
To: Tycho Andersen <tycho@...ho.pizza>
Cc: Rodrigo Campos <rodrigo@...volk.io>,
Andy Lutomirski <luto@...nel.org>,
Kees Cook <keescook@...omium.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux Containers <containers@...ts.linux-foundation.org>,
Christian Brauner <christian.brauner@...ntu.com>,
Mauricio Vásquez Bernal <mauricio@...volk.io>,
Giuseppe Scrivano <gscrivan@...hat.com>,
Will Drewry <wad@...omium.org>, Alban Crequy <alban@...volk.io>
Subject: Re: [PATCH RESEND 2/5] seccomp: Add wait_killable semantic to seccomp
user notifier
On Wed, Apr 28, 2021 at 7:08 AM Tycho Andersen <tycho@...ho.pizza> wrote:
>
> On Wed, Apr 28, 2021 at 03:20:02PM +0200, Rodrigo Campos wrote:
> > On Wed, Apr 28, 2021 at 1:10 PM Rodrigo Campos <rodrigo@...volk.io> wrote:
> > >
> > > On Wed, Apr 28, 2021 at 2:22 AM Tycho Andersen <tycho@...ho.pizza> wrote:
> > > >
> > > > On Tue, Apr 27, 2021 at 04:19:54PM -0700, Andy Lutomirski wrote:
> > > > > User notifiers should allow correct emulation. Right now, it doesn't,
> > > > > but there is no reason it can't.
> > > >
> > > > Thanks for the explanation.
> > > >
> > > > Consider fsmount, which has a,
> > > >
> > > > ret = mutex_lock_interruptible(&fc->uapi_mutex);
> > > > if (ret < 0)
> > > > goto err_fsfd;
> > > >
> > > > If a regular task is interrupted during that wait, it return -EINTR
> > > > or whatever back to userspace.
> > > >
> > > > Suppose that we intercept fsmount. The supervisor decides the mount is
> > > > OK, does the fsmount, injects the mount fd into the container, and
> > > > then the tracee receives a signal. At this point, the mount fd is
> > > > visible inside the container. The supervisor gets a notification about
> > > > the signal and revokes the mount fd, but there was some time where it
> > > > was exposed in the container, whereas with the interrupt in the native
> > > > syscall there was never any exposure.
> > >
> > > IIUC, this is solved by my patch, patch 4 of the series. The
> > > supervisor should do the addfd with the flag added in that patch
> > > (SECCOMP_ADDFD_FLAG_SEND) for an atomic "addfd + send".
> >
> > Well, under Andy's proposal handling that is even simpler. If the
> > signal is delivered after we added the fd (note that the container
> > syscall does not return when the signal arrives, as it happens today,
> > it just signals the notifier and continues to wait), we can just
> > ignore the signal and return that (if that is the appropriate thing
> > for that syscall, but I guess after adding an fd there isn't any other
> > reasonable thing to do).
>
> Yes, agreed. After thinking about this more, my example is bogus: the
> kernel doesn't sleep after it installs the fd, so it would ignore any
> signals too.
>
> Even if the kernel *did* sleep after installing the fd, it would still
> be correct emulation to install it and then do whatever the kernel did
> during that sleep. So I withdraw my objection :)
>
> Thanks,
>
> Tycho
Great.
I'll respin the series and add a
SECCOMP_IOCTL_NOTIF_SET_WAIT_KILLABLE command.
We can do the other aforementioned optimizations above when
specific use cases come up. I would like to work on preemption
notification after this lands though.
Powered by blists - more mailing lists