[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202204291120.428EB85@keescook>
Date: Fri, 29 Apr 2022 11:20:46 -0700
From: Kees Cook <keescook@...omium.org>
To: Sargun Dhillon <sargun@...gun.me>
Cc: Rodrigo Campos <rodrigo@...volk.io>,
LKML <linux-kernel@...r.kernel.org>,
Linux Containers <containers@...ts.linux-foundation.org>,
Christian Brauner <christian.brauner@...ntu.com>,
Giuseppe Scrivano <gscrivan@...hat.com>,
Will Drewry <wad@...omium.org>,
Andy Lutomirski <luto@...capital.net>,
Alban Crequy <alban@...volk.io>
Subject: Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp
user notifier
On Fri, Apr 29, 2022 at 05:14:37PM +0000, Sargun Dhillon wrote:
> On Fri, Apr 29, 2022 at 11:42:15AM +0200, Rodrigo Campos wrote:
> > On Fri, Apr 29, 2022 at 4:32 AM Sargun Dhillon <sargun@...gun.me> wrote:
> > > the concept is searchable. If the notifying process is signaled prior
> > > to the notification being received by the userspace agent, it will
> > > be handled as normal.
> >
> > Why is that? Why not always handle in the same way (if wait killable
> > is set, wait like that)
> >
>
> The goal is to avoid two things:
> 1. Unncessary work - Often times, we see workloads that implement techniques
> like hedging (Also known as request racing[1]). In fact, RFC3484
> (destination address selection) gets implemented where the DNS library
> will connect to many backend addresses and whichever one comes back first
> "wins".
> 2. Side effects - We don't want a situation where a syscall is in progress
> that is non-trivial to rollback (mount), and from user space's perspective
> this syscall never completed.
>
> Blocking before the syscall even starts is excessive. When we looked at this
> we found that with runtimes like Golang, they can get into a bad situation
> if they have many (1000s) of threads that are in the middle of a syscall
> because all of them need to elide prior to GC. In this case the runtime
> prioritizes the liveness of GC vs. the syscalls.
>
> That being said, there may be some syscalls in a filter that need the suggested
> behaviour. I can imagine introducing a new flag
> (say SECCOMP_FILTER_FLAG_WAIT_KILLABLE) that applies to all states.
> Alternatively, in one implementation, I put the behaviour in the data
> field of the return from the BPF filter.
I'd add something like the above to the commit log, just to have it
around.
--
Kees Cook
Powered by blists - more mailing lists