[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CXD3O3XBHKZO.22U5VF0HFBTC9@amazon.com>
Date: Fri, 1 Dec 2023 15:25:06 +0000
From: Nicolas Saenz Julienne <nsaenz@...zon.com>
To: Maxim Levitsky <mlevitsk@...hat.com>, <kvm@...r.kernel.org>
CC: <linux-kernel@...r.kernel.org>, <linux-hyperv@...r.kernel.org>,
<pbonzini@...hat.com>, <seanjc@...gle.com>, <vkuznets@...hat.com>,
<anelkz@...zon.com>, <graf@...zon.com>, <dwmw@...zon.co.uk>,
<jgowans@...zon.com>, <corbert@....net>, <kys@...rosoft.com>,
<haiyangz@...rosoft.com>, <decui@...rosoft.com>, <x86@...nel.org>,
<linux-doc@...r.kernel.org>, Anel Orazgaliyeva <anelkz@...zon.de>
Subject: Re: [RFC 02/33] KVM: x86: Introduce KVM_CAP_APIC_ID_GROUPS
Hi Maxim,
On Tue Nov 28, 2023 at 6:56 AM UTC, Maxim Levitsky wrote:
> On Wed, 2023-11-08 at 11:17 +0000, Nicolas Saenz Julienne wrote:
> > From: Anel Orazgaliyeva <anelkz@...zon.de>
> >
> > Introduce KVM_CAP_APIC_ID_GROUPS, this capability segments the VM's APIC
> > ids into two. The lower bits, the physical APIC id, represent the part
> > that's exposed to the guest. The higher bits, which are private to KVM,
> > groups APICs together. APICs in different groups are isolated from each
> > other, and IPIs can only be directed at APICs that share the same group
> > as its source. Furthermore, groups are only relevant to IPIs, anything
> > incoming from outside the local APIC complex: from the IOAPIC, MSIs, or
> > PV-IPIs is targeted at the default APIC group, group 0.
> >
> > When routing IPIs with physical destinations, KVM will OR the source's
> > vCPU APIC group with the ICR's destination ID and use that to resolve
> > the target lAPIC. The APIC physical map is also made group aware in
> > order to speed up this process. For the sake of simplicity, the logical
> > map is not built while KVM_CAP_APIC_ID_GROUPS is in use and we defer IPI
> > routing to the slower per-vCPU scan method.
> >
> > This capability serves as a building block to implement virtualisation
> > based security features like Hyper-V's Virtual Secure Mode (VSM). VSM
> > introduces a para-virtualised switch that allows for guest CPUs to jump
> > into a different execution context, this switches into a different CPU
> > state, lAPIC state, and memory protections. We model this in KVM by
> > using distinct kvm_vcpus for each context. Moreover, execution contexts
> > are hierarchical and its APICs are meant to remain functional even when
> > the context isn't 'scheduled in'. For example, we have to keep track of
> > timers' expirations, and interrupt execution of lesser priority contexts
> > when relevant. Hence the need to alias physical APIC ids, while keeping
> > the ability to target specific execution contexts.
>
>
> A few general remarks on this patch (assuming that we don't go with
> the approach of a VM per VTL, in which case this patch is not needed)
>
> -> This feature has to be done in the kernel because vCPUs sharing same VTL,
> will have same APIC ID.
> (In addition to that, APIC state is private to a VTL so each VTL
> can even change its apic id).
>
> Because of this KVM has to have at least some awareness of this.
>
> -> APICv/AVIC should be supported with VTL eventually:
> This is thankfully possible by having separate physid/pid tables per VTL,
> and will mostly just work but needs KVM awareness.
>
> -> I am somewhat against reserving bits in apic id, because that will limit
> the number of apic id bits available to userspace. Currently this is not
> a problem but it might be in the future if for some reason the userspace
> will want an apic id with high bits set.
>
> But still things change, and with this being part of KVM's ABI, it might backfire.
> A better idea IMHO is just to have 'APIC namespaces', which like say PID namespaces,
> such as each namespace is isolated IPI wise on its own, and let each vCPU belong to
> a one namespace.
>
> In fact Intel's PRM has a brief mention of a 'hierarchical cluster' mode in which
> roughly describes this situation in which there are multiple not interconnected APIC buses,
> and communication between them needs a 'cluster manager device'
>
> However I don't think that we need an explicit pairs of vCPUs and VTL awareness in the kernel
> all of this I think can be done in userspace.
>
> TL;DR: Lets have APIC namespace. a vCPU can belong to a single namespace, and all vCPUs
> in a namespace send IPIs to each other and know nothing about vCPUs from other namespace.
>
> A vCPU sending IPI to a different VTL thankfully can only do this using a hypercall,
> and thus can be handled in the userspace.
>
>
> Overall though IMHO the approach of a VM per VTL is better unless some show stoppers show up.
> If we go with a VM per VTL, we gain APIC namespaces for free, together with AVIC support and
> such.
Thanks, for the thorough review! I took note of all your design comments
(here and in subsequent patches).
I agree that the way to go is the VM per VTL approach. I'll prepare a
PoC as soon as I'm back from the holidays and share my results.
Nicolas
Powered by blists - more mailing lists