linux-kernel - Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190426083144.GA126896@gmail.com>
Date:   Fri, 26 Apr 2019 10:31:44 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Mike Rapoport <rppt@...ux.ibm.com>
Cc:     linux-kernel@...r.kernel.org,
        Alexandre Chartre <alexandre.chartre@...cle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        James Bottomley <James.Bottomley@...senpartnership.com>,
        Jonathan Adams <jwadams@...gle.com>,
        Kees Cook <keescook@...omium.org>,
        Paul Turner <pjt@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>, linux-mm@...ck.org,
        linux-security-module@...r.kernel.org, x86@...nel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call
 isolation


* Mike Rapoport <rppt@...ux.ibm.com> wrote:

> When enabled, the system call isolation (SCI) would allow execution of 
> the system calls with reduced page tables. These page tables are almost 
> identical to the user page tables in PTI. The only addition is the code 
> page containing system call entry function that will continue 
> exectution after the context switch.
> 
> Unlike PTI page tables, there is no sharing at higher levels and all 
> the hierarchy for SCI page tables is cloned.
> 
> The SCI page tables are created when a system call that requires 
> isolation is executed for the first time.
> 
> Whenever a system call should be executed in the isolated environment, 
> the context is switched to the SCI page tables. Any further access to 
> the kernel memory will generate a page fault. The page fault handler 
> can verify that the access is safe and grant it or kill the task 
> otherwise.
> 
> The initial SCI implementation allows access to any kernel data, but it
> limits access to the code in the following way:
> * calls and jumps to known code symbols without offset are allowed
> * calls and jumps into a known symbol with offset are allowed only if that
> symbol was already accessed and the offset is in the next page
> * all other code access are blocked
> 
> After the isolated system call finishes, the mappings created during its
> execution are cleared.
> 
> The entire SCI page table is lazily freed at task exit() time.

So this basically uses a similar mechanism to the horrendous PTI CR3 
switching overhead whenever a syscall seeks "protection", which overhead 
is only somewhat mitigated by PCID.

This might work on PTI-encumbered CPUs.

While AMD CPUs don't need PTI, nor do they have PCID.

So this feature is hurting the CPU maker who didn't mess up, and is 
hurting future CPUs that don't need PTI ..

I really don't like it where this is going. In a couple of years I really 
want to be able to think of PTI as a bad dream that is mostly over 
fortunately.

I have the feeling that compiler level protection that avoids corrupting 
the stack in the first place is going to be lower overhead, and would 
work in a much broader range of environments. Do we have analysis of what 
the compiler would have to do to prevent most ROP attacks, and what the 
runtime cost of that is?

I mean, C# and Java programs aren't able to corrupt the stack as long as 
the language runtime is corect. Has to be possible, right?

Thanks,

	Ingo