[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACaBj2bfJTgC1AW7XmG76iXa2-=5A2phi5bWfDmvd0PNRpe1OQ@mail.gmail.com>
Date: Mon, 2 May 2022 14:48:35 +0200
From: Rodrigo Campos <rodrigo@...volk.io>
To: Kees Cook <keescook@...omium.org>
Cc: Sargun Dhillon <sargun@...gun.me>,
LKML <linux-kernel@...r.kernel.org>,
Linux Containers <containers@...ts.linux-foundation.org>,
Christian Brauner <christian.brauner@...ntu.com>,
Giuseppe Scrivano <gscrivan@...hat.com>,
Will Drewry <wad@...omium.org>,
Andy Lutomirski <luto@...capital.net>,
Alban Crequy <alban@...volk.io>
Subject: Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp
user notifier
On Fri, Apr 29, 2022 at 8:20 PM Kees Cook <keescook@...omium.org> wrote:
> On Fri, Apr 29, 2022 at 05:14:37PM +0000, Sargun Dhillon wrote:
> > On Fri, Apr 29, 2022 at 11:42:15AM +0200, Rodrigo Campos wrote:
> > > On Fri, Apr 29, 2022 at 4:32 AM Sargun Dhillon <sargun@...gun.me> wrote:
> > > > the concept is searchable. If the notifying process is signaled prior
> > > > to the notification being received by the userspace agent, it will
> > > > be handled as normal.
> > >
> > > Why is that? Why not always handle in the same way (if wait killable
> > > is set, wait like that)
> > >
> >
> > The goal is to avoid two things:
> > 1. Unncessary work - Often times, we see workloads that implement techniques
> > like hedging (Also known as request racing[1]). In fact, RFC3484
> > (destination address selection) gets implemented where the DNS library
> > will connect to many backend addresses and whichever one comes back first
> > "wins".
> > 2. Side effects - We don't want a situation where a syscall is in progress
> > that is non-trivial to rollback (mount), and from user space's perspective
> > this syscall never completed.
> >
> > Blocking before the syscall even starts is excessive. When we looked at this
> > we found that with runtimes like Golang, they can get into a bad situation
> > if they have many (1000s) of threads that are in the middle of a syscall
> > because all of them need to elide prior to GC. In this case the runtime
> > prioritizes the liveness of GC vs. the syscalls.
> >
> > That being said, there may be some syscalls in a filter that need the suggested
> > behaviour. I can imagine introducing a new flag
> > (say SECCOMP_FILTER_FLAG_WAIT_KILLABLE) that applies to all states.
> > Alternatively, in one implementation, I put the behaviour in the data
> > field of the return from the BPF filter.
Makes sense, if we need to, we can implement that in the future too.
> I'd add something like the above to the commit log, just to have it
> around.
Yes, please. It was not obvious to me.
Powered by blists - more mailing lists