lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xojzhg3e6czqg6zqyt3wbnzfafwy7bd7fq43b3ttkhfcw3svot@rakzaalsslfz>
Date: Fri, 18 Jul 2025 16:04:35 +0530
From: Naveen N Rao <naveen@...nel.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Paolo Bonzini <pbonzini@...hat.com>, Suravee Suthikulpanit <suravee.suthikulpanit@....com>, 
	Vasant Hegde <vasant.hegde@....com>
Subject: Re: [PATCH v3 2/2] KVM: SVM: Limit AVIC physical max index based on
 configured max_vcpu_ids

On Mon, Jun 23, 2025 at 05:38:23PM -0700, Sean Christopherson wrote:
> On Thu, Feb 20, 2025, Naveen N Rao (AMD) wrote:
> > KVM allows VMMs to specify the maximum possible APIC ID for a virtual
> > machine through KVM_CAP_MAX_VCPU_ID capability so as to limit data
> > structures related to APIC/x2APIC. Utilize the same to set the AVIC
> > physical max index in the VMCB, similar to VMX. This helps hardware
> > limit the number of entries to be scanned in the physical APIC ID table
> > speeding up IPI broadcasts for virtual machines with smaller number of
> > vcpus.
> > 
> > The minimum allocation required for the Physical APIC ID table is one 4k
> > page supporting up to 512 entries. With AVIC support for 4096 vcpus
> > though, it is sufficient to only allocate memory to accommodate the
> > AVIC physical max index that will be programmed into the VMCB. Limit
> > memory allocated for the Physical APIC ID table accordingly.
> 
> Can you flip the order of the patches?  This seems like an easy "win" for
> performance, and so I can see people wanting to backport this to random kernels
> even if they don't care about running 4k vCPUs.
> 
> Speaking of which, is there a measurable performance win?

That was my first thought. But for VMs upto 512 vCPUs, I didn't see much 
of a performance difference with broadcast IPIs at all. But, I guess it 
shouldn't hurt, so I will prep a smaller patch that can go before the 4k 
vCPU support patch.

With 4k vCPU support enabled, yes, this makes a lot of difference. IIRC, 
ipi-bench for broadcast IPIs went from ~10-15 seconds down to 3 seconds.

> 
> > Signed-off-by: Naveen N Rao (AMD) <naveen@...nel.org>
> > ---
> >  arch/x86/kvm/svm/avic.c | 53 ++++++++++++++++++++++++++++++-----------
> >  arch/x86/kvm/svm/svm.c  |  6 +++++
> >  arch/x86/kvm/svm/svm.h  |  1 +
> >  3 files changed, 46 insertions(+), 14 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 1fb322d2ac18..dac4a6648919 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -85,6 +85,17 @@ struct amd_svm_iommu_ir {
> >  	void *data;		/* Storing pointer to struct amd_ir_data */
> >  };
> >  
> > +static inline u32 avic_get_max_physical_id(struct kvm *kvm, bool is_x2apic)
> 
> Formletter incoming...
> 
> Do not use "inline" for functions that are visible only to the local compilation
> unit.  "inline" is just a hint, and modern compilers are smart enough to inline
> functions when appropriate without a hint.
> 
> A longer explanation/rant here: https://lore.kernel.org/all/ZAdfX+S323JVWNZC@google.com

Ack.

> 
> > +{
> > +	u32 avic_max_physical_id = is_x2apic ? x2avic_max_physical_id : AVIC_MAX_PHYSICAL_ID;
> 
> Don't use a super long local variable.  For a helper like this, it's unnecessary,
> e.g. if the reader can't understand what arch_max or max_id is, then spelling it
> out entirely probably won't help them.
> 
> And practically, there's a danger to using long names like this: you're much more
> likely to unintentionally "shadow" a global variable.  Functionally, it won't be
> a problem, but it can create confusion.  E.g. if we ever added a global
> avic_max_physical_id, then this code would get rather confusing.

Sure, makes sense.

> 
> > +
> > +	/*
> > +	 * Assume vcpu_id is the same as APIC ID. Per KVM_CAP_MAX_VCPU_ID, max_vcpu_ids
> > +	 * represents the max APIC ID for this vm, rather than the max vcpus.
> > +	 */
> > +	return min(kvm->arch.max_vcpu_ids - 1, avic_max_physical_id);
> > +}
> > +
> >  static void avic_activate_vmcb(struct vcpu_svm *svm)
> >  {
> >  	struct vmcb *vmcb = svm->vmcb01.ptr;
> > @@ -103,7 +114,7 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
> >  	 */
> >  	if (x2avic_enabled && apic_x2apic_mode(svm->vcpu.arch.apic)) {
> >  		vmcb->control.int_ctl |= X2APIC_MODE_MASK;
> > -		vmcb->control.avic_physical_id |= x2avic_max_physical_id;
> > +		vmcb->control.avic_physical_id |= avic_get_max_physical_id(svm->vcpu.kvm, true);
> 
> Don't pass hardcoded booleans when it is at all possible to do something else.
> For this case, I would either do:
> 
>   static u32 avic_get_max_physical_id(struct kvm_vcpu *vcpu)
>   {
> 	u32 arch_max;
> 	
> 	if (x2avic_enabled && apic_x2apic_mode(vcpu->arch.apic))

This won't work since we want to use this during vCPU init and at that 
point, I don't think we can rely on the vCPU x2APIC mode to decide the 
size of the AVIC physical ID table.

> 		arch_max = x2avic_max_physical_id;
> 	else
> 		arch_max = AVIC_MAX_PHYSICAL_ID;
> 
> 	return min(kvm->arch.max_vcpu_ids - 1, arch_max);
>   }
>

<snip>

> 
>   static u32 avic_get_max_physical_id(struct kvm_vcpu *vcpu, u32 arch_max)
>   {
> 	return min(kvm->arch.max_vcpu_ids - 1, arch_max);
>   }
> 
>   static void avic_activate_vmcb(struct vcpu_svm *svm)
>   {
> 	struct vmcb *vmcb = svm->vmcb01.ptr;
> 	struct kvm_vcpu *vcpu = &svm->vcpu;
> 	u32 max_id;
> 
> 	vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
> 	vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
> 
> 	/*
> 	 * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
> 	 * accesses, while interrupt injection to a running vCPU can be
> 	 * achieved using AVIC doorbell.  KVM disables the APIC access page
> 	 * (deletes the memslot) if any vCPU has x2APIC enabled, thus 
> 	 enabling
> 	 * AVIC in hybrid mode activates only the doorbell mechanism.
> 	 */
> 	if (x2avic_enabled && apic_x2apic_mode(vcpu->arch.apic)) {
> 		vmcb->control.int_ctl |= X2APIC_MODE_MASK;
> 		max_id = avic_get_max_physical_id(vcpu, x2avic_max_physical_id);
> 
> 		/* Disabling MSR intercept for x2APIC registers */
> 		svm_set_x2apic_msr_interception(svm, false);
> 	} else {
> 		max_id = avic_get_max_physical_id(vcpu, 
> 		AVIC_MAX_PHYSICAL_ID);
> 		/*
> 		 * Flush the TLB, the guest may have inserted a non-APIC
> 		 * mapping into the TLB while AVIC was disabled.
> 		 */
> 		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> 
> 		/* Enabling MSR intercept for x2APIC registers */
> 		svm_set_x2apic_msr_interception(svm, true);
> 	}
> 
> 	vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
> 	vmcb->control.avic_physical_id |= max_id;
>   }
> 
> 
> I don't think I have a preference between the two?

I'm thinking of limiting the helper to just x2AVIC mode, since the 
x1AVIC use is limited to a single place. Let me see what I can come up 
with.

> > +static int svm_vcpu_precreate(struct kvm *kvm)
> > +{
> > +	return avic_alloc_physical_id_table(kvm);
> 
> Why is allocation being moved to svm_vcpu_precreate()?

This is because we want KVM_CAP_MAX_VCPU_ID to have been invoked by the 
VMM, and that is guaranteed to be set by the time the first vCPU is 
created. We restrict the AVIC physical ID table based on the maximum 
number of vCPUs set by the VMM.

This mirrors how Intel VMX uses this capability. I will call this out 
explicitly in the commit log.


Thanks,
Naveen


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ