linux-kernel - Re: [bug report] GICv4.1: multiple vpus execute vgic_v4_load at the same time will greatly increase the time consumption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <86msl6xhu2.wl-maz@kernel.org>
Date: Wed, 21 Aug 2024 11:59:01 +0100
From: Marc Zyngier <maz@...nel.org>
To: Kunkun Jiang <jiangkunkun@...wei.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
	Oliver Upton <oliver.upton@...ux.dev>,
	James Morse <james.morse@....com>,
	Suzuki K Poulose <suzuki.poulose@....com>,
	Zenghui Yu <yuzenghui@...wei.com>,
	"open list:IRQ SUBSYSTEM" <linux-kernel@...r.kernel.org>,
	"moderated\
 list:ARM SMMU DRIVERS" <linux-arm-kernel@...ts.infradead.org>,
	<kvmarm@...ts.linux.dev>,
	"wanghaibin.wang@...wei.com"
	<wanghaibin.wang@...wei.com>,
	<nizhiqiang1@...wei.com>,
	"tangnianyao@...wei.com" <tangnianyao@...wei.com>,
	<wangzhou1@...ilicon.com>
Subject: Re: [bug report] GICv4.1: multiple vpus execute vgic_v4_load at the same time will greatly increase the time consumption

On Wed, 21 Aug 2024 10:51:27 +0100,
Kunkun Jiang <jiangkunkun@...wei.com> wrote:
> 
> Hi all,
> 
> Recently I discovered a problem about GICv4.1, the scenario is as follows:
> 1. Enable GICv4.1
> 2. Create multiple VMs.For example, 50 VMs(4U8G)

I don't know what 4U8G means. On how many physical CPUs are you
running 50 VMs? Direct injection of interrupts and over-subscription
are fundamentally incompatible.

> 3. The business running in VMs has a frequent mmio access and need to exit
>   to qemu for processing.
> 4. Or modify the kvm code so that wfi must trap to kvm
> 5. Then the utilization of pcpu where the vcpu is located will be 100%,and
>   basically all in sys.

What did you expect? If you trap all the time, your performance will
suck.  Don't do that.

> 6. This problem does not exist in GICv3.

Because GICv3 doesn't have the same constraints.

> 
> According to analysis, this problem is due to the execution of vgic_v4_load.
> vcpu_load or kvm_sched_in
>     kvm_arch_vcpu_load
>     ...
>         vgic_v4_load
>             irq_set_affinity
>             ...
>                 irq_do_set_affinity
>                     raw_spin_lock(&tmp_mask_lock)
>                     chip->irq_set_affinity
>                     ...
>                       its_vpe_set_affinity
> 
> The tmp_mask_lock is the key. This is a global lock. I don't quite
> understand
> why tmp_mask_lock is needed here. I think there are two possible
> solutions here:
> 1. Remove this tmp_mask_lock

Maybe you could have a look at 33de0aa4bae98 (and 11ea68f553e24)? It
would allow you to understand the nature of the problem.

This can probably be replaced with a per-CPU cpumask, which would
avoid the locking, but potentially result in a larger memory usage.

> 2. Modify the gicv4 driver,do not perfrom VMOVP via
> irq_set_affinity.

Sure. You could also not use KVM at all if don't care about interrupts
being delivered to your VM. We do not send a VMOVP just for fun. We
send it because your vcpu has moved to a different CPU, and the ITS
needs to know about that.

You seem to be misunderstanding the use case for GICv4: a partitioned
system, without any over-subscription, no vcpu migration between CPUs.
If that's not your setup, then GICv4 will always be a net loss
compared to SW injection with GICv3 (additional HW interaction,
doorbell interrupts).

	M.

-- 
Without deviation from the norm, progress is not possible.