linux-kernel - Re: [RFC KVM 00/27] KVM Address Space Isolation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f0d218f1-076e-e8ce-ebf8-84712a126b32@oracle.com>
Date:   Tue, 14 May 2019 14:32:28 -0700
From:   Jan Setje-Eilers <jan.setjeeilers@...cle.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>
Cc:     Liran Alon <liran.alon@...cle.com>,
        Alexandre Chartre <alexandre.chartre@...cle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krcmar <rkrcmar@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        kvm list <kvm@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Linux-MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        Jonathan Adams <jwadams@...gle.com>
Subject: Re: [RFC KVM 00/27] KVM Address Space Isolation


On 5/14/19 12:37 AM, Peter Zijlstra wrote:
> On Mon, May 13, 2019 at 07:07:36PM -0700, Andy Lutomirski wrote:
>> On Mon, May 13, 2019 at 2:09 PM Liran Alon <liran.alon@...cle.com> wrote:
>>> The hope is that the very vast majority of #VMExit handlers will be
>>> able to completely run without requiring to switch to full address
>>> space. Therefore, avoiding the performance hit of (2).
>>> However, for the very few #VMExits that does require to run in full
>>> kernel address space, we must first kick the sibling hyperthread
>>> outside of guest and only then switch to full kernel address space
>>> and only once all hyperthreads return to KVM address space, then
>>> allow then to enter into guest.
>> What exactly does "kick" mean in this context?  It sounds like you're
>> going to need to be able to kick sibling VMs from extremely atomic
>> contexts like NMI and MCE.
> Yeah, doing the full synchronous thing from NMI/MCE context sounds
> exceedingly dodgy, howver..
>
> Realistically they only need to send an IPI to the other sibling; they
> don't need to wait for the VMExit to complete or anything else.
>
> And that is something we can do from NMI context -- with a bit of care.
> See also arch_irq_work_raise(); specifically we need to ensure we leave
> the APIC in an idle state, such that if we interrupted an APIC sequence
> it will not suddenly fail/violate the APIC write/state etc.
>
  I've been experimenting with IPI'ing siblings on vmexit, primarily 
because we know we'll need it if ASI turns out to be viable, but also 
because I wanted to understand why previous experiments resulted in such 
poor performance.

  You're correct that you don't need to wait for the sibling to come out 
once you send the IPI. That hardware thread will not do anything other 
than process the IPI once it's sent. There is still some need for 
synchronization, at least for the every vmexit case, since you always 
want to make sure that one thread is actually doing work while the other 
one is held. I have this working for some cases, but not enough to call 
it a general solution. I'm not at all sure that the every vmexit case 
can be made to perform for the general case. Even the non-general case 
uses synchronization that I fear might be overly complex.

  For the cases I do have working, simply not pinning the sibling when 
we exit due to the quest idling is a big enough win to put performance 
into a much more reasonable range.

  Base on this, I believe that pining a sibling HT in a subset of cases, 
when we interact with full kernel address space, is almost certainly 
reasonable.

-jan