linux-kernel - Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp user notifier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACaBj2bfJTgC1AW7XmG76iXa2-=5A2phi5bWfDmvd0PNRpe1OQ@mail.gmail.com>
Date:   Mon, 2 May 2022 14:48:35 +0200
From:   Rodrigo Campos <rodrigo@...volk.io>
To:     Kees Cook <keescook@...omium.org>
Cc:     Sargun Dhillon <sargun@...gun.me>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Giuseppe Scrivano <gscrivan@...hat.com>,
        Will Drewry <wad@...omium.org>,
        Andy Lutomirski <luto@...capital.net>,
        Alban Crequy <alban@...volk.io>
Subject: Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp
 user notifier

On Fri, Apr 29, 2022 at 8:20 PM Kees Cook <keescook@...omium.org> wrote:
> On Fri, Apr 29, 2022 at 05:14:37PM +0000, Sargun Dhillon wrote:
> > On Fri, Apr 29, 2022 at 11:42:15AM +0200, Rodrigo Campos wrote:
> > > On Fri, Apr 29, 2022 at 4:32 AM Sargun Dhillon <sargun@...gun.me> wrote:
> > > > the concept is searchable. If the notifying process is signaled prior
> > > > to the notification being received by the userspace agent, it will
> > > > be handled as normal.
> > >
> > > Why is that? Why not always handle in the same way (if wait killable
> > > is set, wait like that)
> > >
> >
> > The goal is to avoid two things:
> > 1. Unncessary work - Often times, we see workloads that implement techniques
> >    like hedging (Also known as request racing[1]). In fact, RFC3484
> >    (destination address selection) gets implemented where the DNS library
> >    will connect to many backend addresses and whichever one comes back first
> >    "wins".
> > 2. Side effects - We don't want a situation where a syscall is in progress
> >    that is non-trivial to rollback (mount), and from user space's perspective
> >    this syscall never completed.
> >
> > Blocking before the syscall even starts is excessive. When we looked at this
> > we found that with runtimes like Golang, they can get into a bad situation
> > if they have many (1000s) of threads that are in the middle of a syscall
> > because all of them need to elide prior to GC. In this case the runtime
> > prioritizes the liveness of GC vs. the syscalls.
> >
> > That being said, there may be some syscalls in a filter that need the suggested
> > behaviour. I can imagine introducing a new flag
> > (say SECCOMP_FILTER_FLAG_WAIT_KILLABLE) that applies to all states.
> > Alternatively, in one implementation, I put the behaviour in the data
> > field of the return from the BPF filter.

Makes sense, if we need to, we can implement that in the future too.

> I'd add something like the above to the commit log, just to have it
> around.

Yes, please. It was not obvious to me.