linux-kernel - Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp user notifier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <202204291120.428EB85@keescook>
Date:   Fri, 29 Apr 2022 11:20:46 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Sargun Dhillon <sargun@...gun.me>
Cc:     Rodrigo Campos <rodrigo@...volk.io>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Giuseppe Scrivano <gscrivan@...hat.com>,
        Will Drewry <wad@...omium.org>,
        Andy Lutomirski <luto@...capital.net>,
        Alban Crequy <alban@...volk.io>
Subject: Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp
 user notifier

On Fri, Apr 29, 2022 at 05:14:37PM +0000, Sargun Dhillon wrote:
> On Fri, Apr 29, 2022 at 11:42:15AM +0200, Rodrigo Campos wrote:
> > On Fri, Apr 29, 2022 at 4:32 AM Sargun Dhillon <sargun@...gun.me> wrote:
> > > the concept is searchable. If the notifying process is signaled prior
> > > to the notification being received by the userspace agent, it will
> > > be handled as normal.
> > 
> > Why is that? Why not always handle in the same way (if wait killable
> > is set, wait like that)
> > 
> 
> The goal is to avoid two things:
> 1. Unncessary work - Often times, we see workloads that implement techniques
>    like hedging (Also known as request racing[1]). In fact, RFC3484
>    (destination address selection) gets implemented where the DNS library
>    will connect to many backend addresses and whichever one comes back first
>    "wins".
> 2. Side effects - We don't want a situation where a syscall is in progress
>    that is non-trivial to rollback (mount), and from user space's perspective
>    this syscall never completed.
> 
> Blocking before the syscall even starts is excessive. When we looked at this
> we found that with runtimes like Golang, they can get into a bad situation
> if they have many (1000s) of threads that are in the middle of a syscall
> because all of them need to elide prior to GC. In this case the runtime
> prioritizes the liveness of GC vs. the syscalls.
> 
> That being said, there may be some syscalls in a filter that need the suggested 
> behaviour. I can imagine introducing a new flag
> (say SECCOMP_FILTER_FLAG_WAIT_KILLABLE) that applies to all states.
> Alternatively, in one implementation, I put the behaviour in the data
> field of the return from the BPF filter.

I'd add something like the above to the commit log, just to have it
around.

-- 
Kees Cook