[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0d218f1-076e-e8ce-ebf8-84712a126b32@oracle.com>
Date: Tue, 14 May 2019 14:32:28 -0700
From: Jan Setje-Eilers <jan.setjeeilers@...cle.com>
To: Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>
Cc: Liran Alon <liran.alon@...cle.com>,
Alexandre Chartre <alexandre.chartre@...cle.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Radim Krcmar <rkrcmar@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
kvm list <kvm@...r.kernel.org>, X86 ML <x86@...nel.org>,
Linux-MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Jonathan Adams <jwadams@...gle.com>
Subject: Re: [RFC KVM 00/27] KVM Address Space Isolation
On 5/14/19 12:37 AM, Peter Zijlstra wrote:
> On Mon, May 13, 2019 at 07:07:36PM -0700, Andy Lutomirski wrote:
>> On Mon, May 13, 2019 at 2:09 PM Liran Alon <liran.alon@...cle.com> wrote:
>>> The hope is that the very vast majority of #VMExit handlers will be
>>> able to completely run without requiring to switch to full address
>>> space. Therefore, avoiding the performance hit of (2).
>>> However, for the very few #VMExits that does require to run in full
>>> kernel address space, we must first kick the sibling hyperthread
>>> outside of guest and only then switch to full kernel address space
>>> and only once all hyperthreads return to KVM address space, then
>>> allow then to enter into guest.
>> What exactly does "kick" mean in this context? It sounds like you're
>> going to need to be able to kick sibling VMs from extremely atomic
>> contexts like NMI and MCE.
> Yeah, doing the full synchronous thing from NMI/MCE context sounds
> exceedingly dodgy, howver..
>
> Realistically they only need to send an IPI to the other sibling; they
> don't need to wait for the VMExit to complete or anything else.
>
> And that is something we can do from NMI context -- with a bit of care.
> See also arch_irq_work_raise(); specifically we need to ensure we leave
> the APIC in an idle state, such that if we interrupted an APIC sequence
> it will not suddenly fail/violate the APIC write/state etc.
>
I've been experimenting with IPI'ing siblings on vmexit, primarily
because we know we'll need it if ASI turns out to be viable, but also
because I wanted to understand why previous experiments resulted in such
poor performance.
You're correct that you don't need to wait for the sibling to come out
once you send the IPI. That hardware thread will not do anything other
than process the IPI once it's sent. There is still some need for
synchronization, at least for the every vmexit case, since you always
want to make sure that one thread is actually doing work while the other
one is held. I have this working for some cases, but not enough to call
it a general solution. I'm not at all sure that the every vmexit case
can be made to perform for the general case. Even the non-general case
uses synchronization that I fear might be overly complex.
For the cases I do have working, simply not pinning the sibling when
we exit due to the quest idling is a big enough win to put performance
into a much more reasonable range.
Base on this, I believe that pining a sibling HT in a subset of cases,
when we interact with full kernel address space, is almost certainly
reasonable.
-jan
Powered by blists - more mailing lists