lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16cbd9ee-d8d9-7253-6d60-8ebf014aec06@oracle.com>
Date:   Wed, 18 Nov 2020 18:15:11 +0100
From:   Alexandre Chartre <alexandre.chartre@...cle.com>
To:     David Laight <David.Laight@...LAB.COM>,
        Borislav Petkov <bp@...en8.de>
Cc:     "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "luto@...nel.org" <luto@...nel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "thomas.lendacky@....com" <thomas.lendacky@....com>,
        "jroedel@...e.de" <jroedel@...e.de>,
        "konrad.wilk@...cle.com" <konrad.wilk@...cle.com>,
        "jan.setjeeilers@...cle.com" <jan.setjeeilers@...cle.com>,
        "junaids@...gle.com" <junaids@...gle.com>,
        "oweisse@...gle.com" <oweisse@...gle.com>,
        "rppt@...ux.vnet.ibm.com" <rppt@...ux.vnet.ibm.com>,
        "graf@...zon.de" <graf@...zon.de>,
        "mgross@...ux.intel.com" <mgross@...ux.intel.com>,
        "kuzuno@...il.com" <kuzuno@...il.com>
Subject: Re: [RFC][PATCH v2 00/21] x86/pti: Defer CR3 switch to C code


On 11/18/20 2:22 PM, David Laight wrote:
> From: Alexandre Chartre
>> Sent: 18 November 2020 10:30
> ...
>> Correct, this RFC is not changing the overhead. However, it is a step forward
>> for being able to execute some selected syscalls or interrupt handlers without
>> switching to the kernel page-table. The next step would be to identify and add
>> the necessary mapping to the user page-table so that specified syscalls can be
>> executed without switching the page-table.
> 
> Remember that without PTI user space can read all kernel memory.
> (I'm not 100% sure you can force a cache-line read.)
> It isn't even that slow.
> (Even I can understand how it works.)
>
> So if you are worried about user space doing that you can't really
> run anything on the user page tables.

Yes, without PTI, userspace can read all kernel memory. But to run some
part of the kernel you don't need to have all kernel mappings. Also a lot
of the kernel contain non-sensitive information which can be safely expose
to userspace. So there's probably some room for running carefully selected
syscalls with the user page-table (and hopefully useful ones).
  

> System calls like getpid() are irrelevant - they aren't used (much).
> Even the time of day ones are implemented in the VDSO without a
> context switch.

getpid()/getppid() is interesting because it provides the amount of overhead
PTI is adding. But the impact can be more important if some TLB flushing are
also required (as you mentioned below).


> So the overheads come from other system calls that 'do work'
> without actually sleeping.
> I'm guessing things like read, write, sendmsg, recvmsg.
> 
> The only interesting system call I can think of is futex.
> As well as all the calls that return immediately because the
> mutex has been released while entering the kernel, I suspect
> that being pre-empted by a different thread (of the same process)
> doesn't actually need CR3 reloading (without PTI).
> 
> I also suspect that it isn't just the CR3 reload that costs.
> There could (depending on the cpu) be associated TLB and/or cache
> invalidations that have a much larger effect on programs with
> large working sets than on simple benchmark programs.

Right, although the TLB flush is mitigated with PCID, but this has
more impact if there's no PCID.


> Now bits of data that you are 'more worried about' could be kept
> in physical memory that isn't normally mapped (or referenced by
> a TLB) and only mapped when needed.
> But that doesn't help the general case.
> 

Note that having syscall which could be done without switching the
page-table is just one benefit you can get from this RFC. But the main
benefit is for integrating Address Space Isolation (ASI) which will be
much more complex if ASI as to plug in the current assembly CR3 switch.

Thanks,

alex.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ