linux-kernel - Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a14be8b0-a9a2-cf96-939e-cedf7e0e669a@gmail.com>
Date:   Sun, 31 May 2020 21:36:18 +0300
From:   Paul Gofman <gofmanp@...il.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Gabriel Krisman Bertazi <krisman@...labora.com>,
        Linux-MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, kernel@...labora.com,
        Thomas Gleixner <tglx@...utronix.de>,
        Kees Cook <keescook@...omium.org>,
        Will Drewry <wad@...omium.org>,
        "H . Peter Anvin" <hpa@...or.com>,
        Zebediah Figura <zfigura@...eweavers.com>
Subject: Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory
 areas

On 5/31/20 21:10, Andy Lutomirski wrote:
>
> That's not what I meant.  I meant that you would set the kernel up to
> redirect *all* syscalls from the thread with the sole exception of one
> syscall instruction in the thunk.  This would catch Windows syscalls
> and Linux syscalls.  The thunk would determine whether the original
> syscall was Linux or Windows and handle it accordingly.
>
> This may interact poorly with the DRM scheme.  The redzone might need
> to be respected, or stack switching might be needed.

Oh yeah, I see now, thanks. Sure, we could trap every syscall and have a
Seccomp-allowed trampoline for executing native ones with the existing
Seccomp implementation. But this is going to have prohibitive
performance impact. Our present use case specifics is that vast majority
of syscalls do not need to be emulated, they are native. And just a few
go from the Windows application which we need to trap and route to our
handler to let the program continue, while we do not care too much about
the overhead for those few. So the hope was that the kernel can route
that majority of Linux native syscalls inside with the minor overhead.
I've read the suggestion to use SECCOMP_RET_USER_NOTIF instead of
SECCOMP_RET_TRAP, is handling the trap this way supposed to be much
quicker than handling the sigsys from SECCOMP_RET_TRAP? More
specifically, would not SECCOMP_RET_USER_NOTIF effectively serialize all
the syscalls waiting in a single queue for processing, while
SECCOMP_RET_TRAP can be processed without exclusive locking?