[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrX5UJ075oH6-Cm3Uxz_XZQffi_5vZOtc8Yq3ZRmKfcDhg@mail.gmail.com>
Date: Thu, 6 May 2021 16:59:05 -0700
From: Andy Lutomirski <luto@...nel.org>
To: Rodrigo Campos <rodrigo@...volk.io>
Cc: Sargun Dhillon <sargun@...gun.me>,
Andy Lutomirski <luto@...nel.org>,
Kees Cook <keescook@...omium.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux Containers <containers@...ts.linux-foundation.org>,
Mauricio Vásquez Bernal <mauricio@...volk.io>,
Tycho Andersen <tycho@...ho.pizza>,
Giuseppe Scrivano <gscrivan@...hat.com>,
Christian Brauner <christian.brauner@...ntu.com>
Subject: Re: [PATCH v2 2/5] seccomp: Add wait_killable semantic to seccomp
user notifier
On Thu, May 6, 2021 at 12:35 PM Rodrigo Campos <rodrigo@...volk.io> wrote:
> We want to be woken-up in seccomp_do_user_notification() so we know we
> received a signal and notify to the supervisor about it. IMHO, we want
> an additional "if" here[2], like (haven't checked the retun codes of
> functions) "if err != 0" and do a "wake_up_poll(&match->wqh, EPOLLIN
> | EPOLLRDNORM);" or POLLPRI or something, and then "goto wait". IOW,
> we only return when the supervisor tells us to, if a signal arrives we
> just let the supervisor know and continue waiting for a response. The
> proper response might be "I wrote partial data, return only 3 bytes",
> etc. This way we can properly emulate it.
>
> What is not clear to me, and I'd like others to comment is:
>
> 1. How should this new "the supervisor should handle signals" mode be enabled?
> a. If we use an extra ioctl, as Andy suggested, I think we have a
> race too: from the moment the seccomp syscall is issued until the new
> ioctl is called there is a race. If the container does a syscall in
> that window, it _can_ incorrectly return EINTR as it does now. This
> race can be very small and the ioctl can make _all the next syscalls_
> wait in this new mode, so maybe the race is so small that we don't
> care in practice.
If the ioctl is sticky, it can presumably avoid the race entirely by
having whoever calls seccomp() immediately do a dummy syscall or
otherwise wait for the notifier to confirm that it has done the ioctl.
Admittedly, this may be annoying.
> b. But if we want to really fully solve the issue, the other option
> that comes to mind is adding a new flag for the seccomp syscall, to
> signal that the supervisor will handle signals and should be
> forwarded. This way, if the container does a new syscall it will be
> put to wait in this new mode (we have the information that the new
> mode should be used already). Something like
> SECCOMP_FILTER_FLAG_NEW_LISTENER_SOMETHING (today we have
> SECCOMP_FILTER_FLAG_NEW_LISTENER). Ideally shorter :D
I like that better.
>
> 2. What should we notify to the supervisor? Only that a signal arrived
> or also which signal it was? If we do the EPOLLPRI thingy, as Andy
> mentioned in a previous thread, IIUC we will only notify that _a_
> signal arrived, but not _which_. To notify which signal arrived might
> be complex, though, (not sure, I haven't explored how that should be).
Are there any examples in the kernel of syscalls that care *which*
signal came in? As far as I know, there are really only three states
that matter: no signal is pending, a kill is pending, or a regular
signal is pending. (The latter means that there's a signal that is
unmasked, etc and will should therefore interrupt a syscall if the
syscall can be interrupted.
Powered by blists - more mailing lists