lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <53C0BD81-A942-4BB3-8538-D5107E84C5CD@amacapital.net>
Date:   Sun, 31 May 2020 18:51:40 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Brendan Shanks <bshanks@...eweavers.com>
Cc:     Andy Lutomirski <luto@...nel.org>, Paul Gofman <gofmanp@...il.com>,
        Gabriel Krisman Bertazi <krisman@...labora.com>,
        Linux-MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, kernel@...labora.com,
        Thomas Gleixner <tglx@...utronix.de>,
        Kees Cook <keescook@...omium.org>,
        Will Drewry <wad@...omium.org>,
        "H . Peter Anvin" <hpa@...or.com>,
        Zebediah Figura <zfigura@...eweavers.com>
Subject: Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas



> On May 31, 2020, at 4:50 PM, Brendan Shanks <bshanks@...eweavers.com> wrote:
> 
> 
>> On May 31, 2020, at 11:57 AM, Andy Lutomirski <luto@...nel.org> wrote:
>> 
>> Using SECCOMP_RET_USER_NOTIF is likely to be considerably more
>> expensive than my scheme.  On a non-PTI system, my approach will add a
>> few tens of ns to each syscall.  On a PTI system, it will be worse.
>> But using any kind of notifier for all syscalls will cause a context
>> switch to a different user program for each syscall, and that will be
>> much slower.
> 
> There’s also no way (at least to my understanding) to modify register state from SECCOMP_RET_USER_NOTIF, which is how the existing -staging SIGSYS handler works:
> 
> <https://github.com/wine-staging/wine-staging/blob/master/patches/ntdll-Syscall_Emulation/0001-ntdll-Support-x86_64-syscall-emulation.patch#L62>
> 
>> I think that the implementation may well want to live in seccomp, but
>> doing this as a seccomp filter isn't quite right.  It's not a security
>> thing -- it's an emulation thing.  Seccomp is all about making
>> inescapable sandboxes, but that's not what you're doing at all, and
>> the fact that seccomp filters are preserved across execve() sounds
>> like it'll be annoying for you.
> 
> Definitely. Regardless of what approach is taken, we don’t want it to persist across execve.
> 
>> What if there was a special filter type that ran a BPF program on each
>> syscall, and the program was allowed to access user memory to make its
>> decisions, e.g. to look at some list of memory addresses.  But this
>> would explicitly *not* be a security feature -- execve() would remove
>> the filter, and the filter's outcome would be one of redirecting
>> execution or allowing the syscall.  If the "allow" outcome occurs,
>> then regular seccomp filters run.  Obviously the exact semantics here
>> would need some care.
> 
> Although if that’s running a BPF filter on every syscall, wouldn’t it also incur the ~10% overhead that Paul and Gabriel have seen with existing seccomp?
> 
> 

Unlikely. Some benchmarking is needed, but the seccomp ptrace overhead is likely *huge* compared to the overhead of just a filter.

As wild guess numbers on made up modern hardware, cache hot:

Empty syscall: 50ns, or 300ns with PTI

Empty syscall accepted by simple seccomp filter: 10ns more than an empty syscall without seccomp

Seccomp ptrace round trip: 6 us  Worse with PTI

Seccomp user notif round trip: 4 us

Syscall hypothetically redirected back to same process: about the same as an empty filtered accepted syscall, plus however long it takes to run the handler. Add 900ns if using SIGSYS instead of plain redirection. Add an extra 500ns on current kernels because signal delivery sucks, but I can fix this.

Take these numbers with a huge grain of salt.  But the point is that the BPF part is the least of your worries.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ