lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1BFC571D-6C85-409C-8FD3-1E34559A277D@oracle.com>
Date:   Tue, 14 May 2019 11:05:44 +0300
From:   Liran Alon <liran.alon@...cle.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Alexandre Chartre <alexandre.chartre@...cle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krcmar <rkrcmar@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        kvm list <kvm@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Linux-MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        jan.setjeeilers@...cle.com, Jonathan Adams <jwadams@...gle.com>
Subject: Re: [RFC KVM 00/27] KVM Address Space Isolation



> On 14 May 2019, at 5:07, Andy Lutomirski <luto@...nel.org> wrote:
> 
> On Mon, May 13, 2019 at 2:09 PM Liran Alon <liran.alon@...cle.com> wrote:
>> 
>> 
>> 
>>> On 13 May 2019, at 21:17, Andy Lutomirski <luto@...nel.org> wrote:
>>> 
>>>> I expect that the KVM address space can eventually be expanded to include
>>>> the ioctl syscall entries. By doing so, and also adding the KVM page table
>>>> to the process userland page table (which should be safe to do because the
>>>> KVM address space doesn't have any secret), we could potentially handle the
>>>> KVM ioctl without having to switch to the kernel pagetable (thus effectively
>>>> eliminating KPTI for KVM). Then the only overhead would be if a VM-Exit has
>>>> to be handled using the full kernel address space.
>>>> 
>>> 
>>> In the hopefully common case where a VM exits and then gets re-entered
>>> without needing to load full page tables, what code actually runs?
>>> I'm trying to understand when the optimization of not switching is
>>> actually useful.
>>> 
>>> Allowing ioctl() without switching to kernel tables sounds...
>>> extremely complicated.  It also makes the dubious assumption that user
>>> memory contains no secrets.
>> 
>> Let me attempt to clarify what we were thinking when creating this patch series:
>> 
>> 1) It is never safe to execute one hyperthread inside guest while it’s sibling hyperthread runs in a virtual address space which contains secrets of host or other guests.
>> This is because we assume that using some speculative gadget (such as half-Spectrev2 gadget), it will be possible to populate *some* CPU core resource which could then be *somehow* leaked by the hyperthread running inside guest. In case of L1TF, this would be data populated to the L1D cache.
>> 
>> 2) Because of (1), every time a hyperthread runs inside host kernel, we must make sure it’s sibling is not running inside guest. i.e. We must kick the sibling hyperthread outside of guest using IPI.
>> 
>> 3) From (2), we should have theoretically deduced that for every #VMExit, there is a need to kick the sibling hyperthread also outside of guest until the #VMExit is completed. Such a patch series was implemented at some point but it had (obviously) significant performance hit.
>> 
>> 
> 4) The main goal of this patch series is to preserve (2), but to avoid
> the overhead specified in (3).
>> 
>> The way this patch series achieves (4) is by observing that during the run of a VM, most #VMExits can be handled rather quickly and locally inside KVM and doesn’t need to reference any data that is not relevant to this VM or KVM code. Therefore, if we will run these #VMExits in an isolated virtual address space (i.e. KVM isolated address space), there is no need to kick the sibling hyperthread from guest while these #VMExits handlers run.
> 
> Thanks!  This clarifies a lot of things.
> 
>> The hope is that the very vast majority of #VMExit handlers will be able to completely run without requiring to switch to full address space. Therefore, avoiding the performance hit of (2).
>> However, for the very few #VMExits that does require to run in full kernel address space, we must first kick the sibling hyperthread outside of guest and only then switch to full kernel address space and only once all hyperthreads return to KVM address space, then allow then to enter into guest.
> 
> What exactly does "kick" mean in this context?  It sounds like you're
> going to need to be able to kick sibling VMs from extremely atomic
> contexts like NMI and MCE.

Yes that’s true.
“kick” in this context will probably mean sending an IPI to all sibling hyperthreads.
This IPI will cause these sibling hyperthreads to exit from guest to host on EXTERNAL_INTERRUPT
and wait for a condition that again allows to enter back into guest.
This condition will be once all hyperthreads of CPU core is again running only within KVM isolated address space of this VM.

-Liran



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ