lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86msl6xhu2.wl-maz@kernel.org>
Date: Wed, 21 Aug 2024 11:59:01 +0100
From: Marc Zyngier <maz@...nel.org>
To: Kunkun Jiang <jiangkunkun@...wei.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
	Oliver Upton <oliver.upton@...ux.dev>,
	James Morse <james.morse@....com>,
	Suzuki K Poulose <suzuki.poulose@....com>,
	Zenghui Yu <yuzenghui@...wei.com>,
	"open list:IRQ SUBSYSTEM" <linux-kernel@...r.kernel.org>,
	"moderated\
 list:ARM SMMU DRIVERS" <linux-arm-kernel@...ts.infradead.org>,
	<kvmarm@...ts.linux.dev>,
	"wanghaibin.wang@...wei.com"
	<wanghaibin.wang@...wei.com>,
	<nizhiqiang1@...wei.com>,
	"tangnianyao@...wei.com" <tangnianyao@...wei.com>,
	<wangzhou1@...ilicon.com>
Subject: Re: [bug report] GICv4.1: multiple vpus execute vgic_v4_load at the same time will greatly increase the time consumption

On Wed, 21 Aug 2024 10:51:27 +0100,
Kunkun Jiang <jiangkunkun@...wei.com> wrote:
> 
> Hi all,
> 
> Recently I discovered a problem about GICv4.1, the scenario is as follows:
> 1. Enable GICv4.1
> 2. Create multiple VMs.For example, 50 VMs(4U8G)

I don't know what 4U8G means. On how many physical CPUs are you
running 50 VMs? Direct injection of interrupts and over-subscription
are fundamentally incompatible.

> 3. The business running in VMs has a frequent mmio access and need to exit
>   to qemu for processing.
> 4. Or modify the kvm code so that wfi must trap to kvm
> 5. Then the utilization of pcpu where the vcpu is located will be 100%,and
>   basically all in sys.

What did you expect? If you trap all the time, your performance will
suck.  Don't do that.

> 6. This problem does not exist in GICv3.

Because GICv3 doesn't have the same constraints.

> 
> According to analysis, this problem is due to the execution of vgic_v4_load.
> vcpu_load or kvm_sched_in
>     kvm_arch_vcpu_load
>     ...
>         vgic_v4_load
>             irq_set_affinity
>             ...
>                 irq_do_set_affinity
>                     raw_spin_lock(&tmp_mask_lock)
>                     chip->irq_set_affinity
>                     ...
>                       its_vpe_set_affinity
> 
> The tmp_mask_lock is the key. This is a global lock. I don't quite
> understand
> why tmp_mask_lock is needed here. I think there are two possible
> solutions here:
> 1. Remove this tmp_mask_lock

Maybe you could have a look at 33de0aa4bae98 (and 11ea68f553e24)? It
would allow you to understand the nature of the problem.

This can probably be replaced with a per-CPU cpumask, which would
avoid the locking, but potentially result in a larger memory usage.

> 2. Modify the gicv4 driver,do not perfrom VMOVP via
> irq_set_affinity.

Sure. You could also not use KVM at all if don't care about interrupts
being delivered to your VM. We do not send a VMOVP just for fun. We
send it because your vcpu has moved to a different CPU, and the ITS
needs to know about that.

You seem to be misunderstanding the use case for GICv4: a partitioned
system, without any over-subscription, no vcpu migration between CPUs.
If that's not your setup, then GICv4 will always be a net loss
compared to SW injection with GICv3 (additional HW interaction,
doorbell interrupts).

	M.

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ