linux-kernel - Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <3691744C-F4BC-49C6-9450-52E31DD14A92@amacapital.net>
Date:   Mon, 1 Jun 2020 06:59:26 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Billy Laws <blaws05@...il.com>
Cc:     krisman@...labora.com, gofmanp@...il.com, hpa@...or.com,
        keescook@...omium.org, kernel@...labora.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        tglx@...utronix.de, wad@...omium.org
Subject: Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas



> On Jun 1, 2020, at 2:23 AM, Billy Laws <blaws05@...il.com> wrote:
> 
> 
>> 
>> On May 30, 2020, at 5:26 PM, Gabriel Krisman Bertazi <krisman@...labora.com> wrote:
>> 
>> Andy Lutomirski <luto@...capital.net> writes:
>> 
>>>>>> On May 29, 2020, at 11:00 PM, Gabriel Krisman Bertazi <krisman@...labora.com> wrote:
>>>>> 
>>>>> Modern Windows applications are executing system call instructions
>>>>> directly from the application's code without going through the WinAPI.
>>>>> This breaks Wine emulation, because it doesn't have a chance to
>>>>> intercept and emulate these syscalls before they are submitted to Linux.
>>>>> 
>>>>> In addition, we cannot simply trap every system call of the application
>>>>> to userspace using PTRACE_SYSEMU, because performance would suffer,
>>>>> since our main use case is to run Windows games over Linux.  Therefore,
>>>>> we need some in-kernel filtering to decide whether the syscall was
>>>>> issued by the wine code or by the windows application.
>>> 
>>> Do you really need in-kernel filtering?  What if you could have
>>> efficient userspace filtering instead?  That is, set something up so
>>> that all syscalls, except those from a special address, are translated
>>> to CALL thunk where the thunk is configured per task.  Then the thunk
>>> can do whatever emulation is needed.
>> 
>> Hi,
>> 
>> I suggested something similar to my customer, by using
>> libsyscall-intercept.  The idea would be overwritting the syscall
>> instruction with a call to the entry point.  I'm not a specialist on the
>> specifics of Windows games, (cc'ed Paul Gofman, who can provide more
>> details on that side), but as far as I understand, the reason why that
>> is not feasible is that the anti-cheat protection in games will abort
>> execution if the binary region was modified either on-disk or in-memory.
>> 
>> Is there some mechanism to do that without modiyfing the application?
> 
> Hi,
> 
> I work on an emulator for the Nintendo Switch that uses a similar technique,
> in our testing it works very well and is much more performant than even
> PTRACE_SYSEMU.
> 
> To work around DRM reading the memory contents I think mprotect could
> be used, after patching the syscall a copy of the original code could be
> kept somewhere in memory and the patched region mapped --X.
> With this, any time the DRM attempts to read to the patched region and
> perform integrity checks it will cause a segfault and a branch to the
> signal handler. This handler can then return the contents of the original,
> unpatched region to satisfy them checks.
> 
> Are memory contents checked by DRM solutions too often for this to be
> performant?

A bigger issue is that hardware support for —X is quite spotty. There is no x86 CPU that can do it cleanly in a bare metal setup, and client CPUs that can do it at all without hypervisor help may be nonexistent. I don’t know if the ARM situation is much better.

> --
> Billy Laws
>> 
>>> Getting the details and especially the interaction with any seccomp
>>> filters that may be installed right could be tricky, but the performance
>>> should be decent, at least on non-PTI systems.
>>> 
>>> (If we go this route, I suspect that the correct interaction with
>>> seccomp is that this type of redirection takes precedence over seccomp
>>> and seccomp filters are not invoked for redirected syscalls. After all,
>>> a redirected syscall is, functionally, not a syscall at all.)
>>> 
>> 
>> 
>> --
>> Gabriel Krisman Bertazi