lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2eqjnjnszlmhlnvw6kcve4exjnpy7skguypwtmxutb2gecs3an@gcou53thsqww>
Date: Thu, 19 Jun 2025 17:01:30 +0530
From: Naveen N Rao <naveen@...nel.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Marc Zyngier <maz@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>, 
	Paolo Bonzini <pbonzini@...hat.com>, Joerg Roedel <joro@...tes.org>, 
	David Woodhouse <dwmw2@...radead.org>, Lu Baolu <baolu.lu@...ux.intel.com>, 
	linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev, kvm@...r.kernel.org, 
	iommu@...ts.linux.dev, linux-kernel@...r.kernel.org, Sairaj Kodilkar <sarunkod@....com>, 
	Vasant Hegde <vasant.hegde@....com>, Maxim Levitsky <mlevitsk@...hat.com>, 
	Joao Martins <joao.m.martins@...cle.com>, Francesco Lavra <francescolavra.fl@...il.com>, 
	David Matlack <dmatlack@...gle.com>
Subject: Re: [PATCH v3 17/62] KVM: SVM: Add enable_ipiv param, never set
 IsRunning if disabled

On Wed, Jun 11, 2025 at 03:45:20PM -0700, Sean Christopherson wrote:
> From: Maxim Levitsky <mlevitsk@...hat.com>
> 
> Let userspace "disable" IPI virtualization for AVIC via the enable_ipiv
> module param, by never setting IsRunning.  SVM doesn't provide a way to
> disable IPI virtualization in hardware, but by ensuring CPUs never see
> IsRunning=1, every IPI in the guest (except for self-IPIs) will generate a
> VM-Exit.

I think this is good to have regardless of the erratum. Not sure about VMX,
but does it make sense to intercept writes to the self-ipi MSR as well?

> 
> To avoid setting the real IsRunning bit, while still allowing KVM to use
> each vCPU's entry to update GA log entries, simply maintain a shadow of
> the entry, without propagating IsRunning updates to the real table when
> IPI virtualization is disabled.
> 
> Providing a way to effectively disable IPI virtualization will allow KVM
> to safely enable AVIC on hardware that is susceptible to erratum #1235,
> which causes hardware to sometimes fail to detect that the IsRunning bit
> has been cleared by software.
> 
> Note, the table _must_ be fully populated, as broadcast IPIs skip invalid
> entries, i.e. won't generate VM-Exit if every entry is invalid, and so
> simply pointing the VMCB at a common dummy table won't work.
> 
> Alternatively, KVM could allocate a shadow of the entire table, but that'd
> be a waste of 4KiB since the per-vCPU entry doesn't actually consume an
> additional 8 bytes of memory (vCPU structures are large enough that they
> are backed by order-N pages).
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> [sean: keep "entry" variables, reuse enable_ipiv, split from erratum]
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/kvm/svm/avic.c | 32 ++++++++++++++++++++++++++------
>  arch/x86/kvm/svm/svm.c  |  2 ++
>  arch/x86/kvm/svm/svm.h  |  8 ++++++++
>  3 files changed, 36 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 0c0be274d29e..48c737e1200a 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -292,6 +292,13 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
>  	/* Setting AVIC backing page address in the phy APIC ID table */
>  	new_entry = avic_get_backing_page_address(svm) |
>  		    AVIC_PHYSICAL_ID_ENTRY_VALID_MASK;
> +	svm->avic_physical_id_entry = new_entry;
> +
> +	/*
> +	 * Initialize the real table, as vCPUs must have a valid entry in order
> +	 * for broadcast IPIs to function correctly (broadcast IPIs ignore
> +	 * invalid entries, i.e. aren't guaranteed to generate a VM-Exit).
> +	 */
>  	WRITE_ONCE(kvm_svm->avic_physical_id_table[id], new_entry);
>  
>  	return 0;
> @@ -769,8 +776,6 @@ static int svm_ir_list_add(struct vcpu_svm *svm,
>  			   struct amd_iommu_pi_data *pi)
>  {
>  	struct kvm_vcpu *vcpu = &svm->vcpu;
> -	struct kvm *kvm = vcpu->kvm;
> -	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
>  	unsigned long flags;
>  	u64 entry;
>  
> @@ -788,7 +793,7 @@ static int svm_ir_list_add(struct vcpu_svm *svm,
>  	 * will update the pCPU info when the vCPU awkened and/or scheduled in.
>  	 * See also avic_vcpu_load().
>  	 */
> -	entry = READ_ONCE(kvm_svm->avic_physical_id_table[vcpu->vcpu_id]);
> +	entry = svm->avic_physical_id_entry;
>  	if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
>  		amd_iommu_update_ga(entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK,
>  				    true, pi->ir_data);
> @@ -998,14 +1003,26 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	 */
>  	spin_lock_irqsave(&svm->ir_list_lock, flags);
>  
> -	entry = READ_ONCE(kvm_svm->avic_physical_id_table[vcpu->vcpu_id]);
> +	entry = svm->avic_physical_id_entry;
>  	WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
>  
>  	entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
>  	entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);
>  	entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
>  
> +	svm->avic_physical_id_entry = entry;
> +
> +	/*
> +	 * If IPI virtualization is disabled, clear IsRunning when updating the
> +	 * actual Physical ID table, so that the CPU never sees IsRunning=1.
> +	 * Keep the APIC ID up-to-date in the entry to minimize the chances of
> +	 * things going sideways if hardware peeks at the ID.
> +	 */
> +	if (!enable_ipiv)
> +		entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
> +
>  	WRITE_ONCE(kvm_svm->avic_physical_id_table[vcpu->vcpu_id], entry);
> +
>  	avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true);
>  
>  	spin_unlock_irqrestore(&svm->ir_list_lock, flags);
> @@ -1030,7 +1047,7 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
>  	 * can't be scheduled out and thus avic_vcpu_{put,load}() can't run
>  	 * recursively.
>  	 */
> -	entry = READ_ONCE(kvm_svm->avic_physical_id_table[vcpu->vcpu_id]);
> +	entry = svm->avic_physical_id_entry;
>  
>  	/* Nothing to do if IsRunning == '0' due to vCPU blocking. */
>  	if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
> @@ -1049,7 +1066,10 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
>  	avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
>  
>  	entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
> -	WRITE_ONCE(kvm_svm->avic_physical_id_table[vcpu->vcpu_id], entry);
> +	svm->avic_physical_id_entry = entry;
> +
> +	if (enable_ipiv)
> +		WRITE_ONCE(kvm_svm->avic_physical_id_table[vcpu->vcpu_id], entry);

If enable_ipiv is false, then isRunning bit will never be set and we 
would have bailed out earlier. So, the check for enable_ipiv can be 
dropped here (or converted into an assert).

- Naveen


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ