[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7timm7vdq4vjwn6jo5bwgtbn3f7pdtdch7l5dws76pjg7syqwb@al5mifdmboog>
Date: Wed, 18 Jun 2025 20:03:06 +0530
From: Naveen N Rao <naveen.rao@....com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Marc Zyngier <maz@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>,
Paolo Bonzini <pbonzini@...hat.com>, Joerg Roedel <joro@...tes.org>, "David
Woodhouse" <dwmw2@...radead.org>, Lu Baolu <baolu.lu@...ux.intel.com>,
<linux-arm-kernel@...ts.infradead.org>, <kvmarm@...ts.linux.dev>,
<kvm@...r.kernel.org>, <iommu@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>, Sairaj Kodilkar <sarunkod@....com>, "Vasant
Hegde" <vasant.hegde@....com>, Maxim Levitsky <mlevitsk@...hat.com>, "Joao
Martins" <joao.m.martins@...cle.com>, Francesco Lavra
<francescolavra.fl@...il.com>, David Matlack <dmatlack@...gle.com>
Subject: Re: [PATCH v3 12/62] KVM: SVM: Inhibit AVIC if ID is too big instead
of rejecting vCPU creation
On Tue, Jun 17, 2025 at 09:10:10AM -0700, Sean Christopherson wrote:
> On Tue, Jun 17, 2025, Naveen N Rao wrote:
> > On Wed, Jun 11, 2025 at 03:45:15PM -0700, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > index ab228872a19b..f0a74b102c57 100644
> > > --- a/arch/x86/kvm/svm/avic.c
> > > +++ b/arch/x86/kvm/svm/avic.c
> > > @@ -277,9 +277,19 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
> > > int id = vcpu->vcpu_id;
> > > struct vcpu_svm *svm = to_svm(vcpu);
> > >
> > > + /*
> > > + * Inhibit AVIC if the vCPU ID is bigger than what is supported by AVIC
> > > + * hardware. Immediately clear apicv_active, i.e. don't wait until the
> > > + * KVM_REQ_APICV_UPDATE request is processed on the first KVM_RUN, as
> > > + * avic_vcpu_load() expects to be called if and only if the vCPU has
> > > + * fully initialized AVIC.
> > > + */
> > > if ((!x2avic_enabled && id > AVIC_MAX_PHYSICAL_ID) ||
> > > - (id > X2AVIC_MAX_PHYSICAL_ID))
> > > - return -EINVAL;
> > > + (id > X2AVIC_MAX_PHYSICAL_ID)) {
> > > + kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_TOO_BIG);
> > > + vcpu->arch.apic->apicv_active = false;
> >
> > This bothers me a bit. kvm_create_lapic() does this:
> > if (enable_apicv) {
> > apic->apicv_active = true;
> > kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
> > }
> >
> > But, setting apic->apicv_active to false here means KVM_REQ_APICV_UPDATE
> > is going to be a no-op.
>
> That's fine, KVM_REQ_APICV_UPDATE is a nop in other scenarios, too. I agree it's
> not ideal, but this is a rather extreme edge case and a one-time slow path, so I
> don't think it's worth doing anything special just to avoid KVM_REQ_APICV_UPDATE.
>
> > This does not look to be a big deal given that kvm_create_lapic()
> > is called just a bit before svm_vcpu_create() (which calls the above
> > function through avic_init_vcpu()) in kvm_arch_vcpu_create(), so there
> > isn't that much done before apicv_active is toggled.
> >
> > But, this made me wonder if introducing a kvm_x86_op to check and
> > enable/disable apic->apicv_active in kvm_create_lapic() might be cleaner
> > overall
>
> Hmm, yes and no. I completely agree that clearing apicv_active in avic.c is all
> kinds of gross, but clearing apic->apicv_active in lapic.c to handle this particular
> scenario is just as problematic, because then avic_init_backing_page() would need
> to check kvm_vcpu_apicv_active() to determine whether or not to allocate the backing
> page. In a way, that's even worse, because setting apic->apicv_active by default
> is purely an optimization, i.e. leaving it %false _should_ work as well, it would
> just be suboptimal. But if AVIC were to key off apic->apicv_active,
> that could
> lead to KVM incorrectly skipping allocation of the AVIC backing page.
I understand your concern about key'ing off apic->apicv_active - that
would definitely require a thorough audit and does add complexity to
this.
However, as far as I can see, after your current series, we no longer
maintain a pointer to the AVIC backing page, but instead rely on the
lapic-allocated page.
Were you referring to the APIC access page though? That is behind
kvm_apicv_activated() today, which looks to be problematic if there are
inhibits set during vcpu_create() and if those can be unset later?
Shouldn't we be allocating the apic access page unconditionally here?
>
> > Maybe even have it be the initialization point for APICv:
> > apicv_init(), so we can invoke avic_init_vcpu() right away?
>
> I mostly like this idea (actually, I take that back; see below), but VMX throws
> a big wrench in things. Unlike SVM, VMX doesn't have a singular "enable APICv"
> flag. Rather, KVM considers "APICv" to be the combination of APIC-register
> virtualization, virtual interrupt delivery, and posted interrupts.
>
> Which is totally fine. The big wrench is that the are other features that interact
> with "APICv" and require allocations. E.g. the APIC access page isn't actually
> tied to enable_apicv, it's tied to yet another VMX feature, "virtualize APIC
> accesses" (not to be confused with APIC-register virtualization; don't blame me,
> I didn't create the control knobs/names).
>
> As a result, KVM may need to allocate the APIC access page (not to be confused
> with the APIC *backing* page; again, don't blame me :-D) when enable_apicv=false,
> and even more confusingly, NOT allocate the APIC access page when enable_apicv=true.
>
> if (cpu_need_virtualize_apic_accesses(vcpu)) { <=== not an enable_apic check, *sigh*
> err = kvm_alloc_apic_access_page(vcpu->kvm);
> if (err)
> goto free_vmcs;
> }
Thanks for the background and the details there -- I am going to need
some time to go unpack all of that :)
>
> And for both SVM and VMX, IPI virtualization adds another wrinkle, in that the
> per-vCPU setup needs to fill an entry in a per-VM table. And filling that entry
> needs to happen at the very end of vCPU creation, e.g. so that the vCPU can't be
> targeted until KVM is ready to run the vCPU.
>
> Ouch. And I'm pretty sure there's a use-after-free in the AVIC code. If
> svm_vcpu_alloc_msrpm() fails, the avic_physical_id_table[] will still have a pointer
> to freed vAPIC page.
Oops.
>
> Thus, invoking avic_init_vcpu() "right away" is unfortunately flat out unsafe
> (took me a while to realize that).
>
> So while I 100% agree with your dislike of this patch, I think it's the least
> awful band-aid, at least in the short term.
>
> Longer term, I'd definitely be in favor of cleaning up the related flows, but I
> think that's best done on top of this series, because I suspect it'll be somewhat
> invasive.
Sure, that makes sense. For this patch:
Acked-by: Naveen N Rao (AMD) <naveen@...nel.org>
Thanks,
Naveen
Powered by blists - more mailing lists