linux-kernel - Re: [bug report] GICv4.1: multiple vpus execute vgic_v4_load at the same time will greatly increase the time consumption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f1574274-efd8-eb56-436b-5a1dd7620f2c@huawei.com>
Date: Thu, 22 Aug 2024 02:23:30 +0800
From: Kunkun Jiang <jiangkunkun@...wei.com>
To: Marc Zyngier <maz@...nel.org>
CC: Thomas Gleixner <tglx@...utronix.de>, Oliver Upton
	<oliver.upton@...ux.dev>, James Morse <james.morse@....com>, Suzuki K Poulose
	<suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>, "open list:IRQ
 SUBSYSTEM" <linux-kernel@...r.kernel.org>, "moderated list:ARM SMMU DRIVERS"
	<linux-arm-kernel@...ts.infradead.org>, <kvmarm@...ts.linux.dev>,
	"wanghaibin.wang@...wei.com" <wanghaibin.wang@...wei.com>,
	<nizhiqiang1@...wei.com>, "tangnianyao@...wei.com" <tangnianyao@...wei.com>,
	<wangzhou1@...ilicon.com>
Subject: Re: [bug report] GICv4.1: multiple vpus execute vgic_v4_load at the
 same time will greatly increase the time consumption

Hi Marc,

On 2024/8/21 18:59, Marc Zyngier wrote:
> On Wed, 21 Aug 2024 10:51:27 +0100,
> Kunkun Jiang <jiangkunkun@...wei.com> wrote:
>>
>> Hi all,
>>
>> Recently I discovered a problem about GICv4.1, the scenario is as follows:
>> 1. Enable GICv4.1
>> 2. Create multiple VMs.For example, 50 VMs(4U8G)

s/4U8G/8U16G/, sorry..

> I don't know what 4U8G means. On how many physical CPUs are you
> running 50 VMs? Direct injection of interrupts and over-subscription
> are fundamentally incompatible.

Each VM is configured with 8 vcpus and 16G memory. The number of
physical CPUs is 320.

> 
>> 3. The business running in VMs has a frequent mmio access and need to exit
>>    to qemu for processing.
>> 4. Or modify the kvm code so that wfi must trap to kvm
>> 5. Then the utilization of pcpu where the vcpu is located will be 100%,and
>>    basically all in sys.
> 
> What did you expect? If you trap all the time, your performance will
> suck.  Don't do that.
> 
>> 6. This problem does not exist in GICv3.
> 
> Because GICv3 doesn't have the same constraints.
> 
>>
>> According to analysis, this problem is due to the execution of vgic_v4_load.
>> vcpu_load or kvm_sched_in
>>      kvm_arch_vcpu_load
>>      ...
>>          vgic_v4_load
>>              irq_set_affinity
>>              ...
>>                  irq_do_set_affinity
>>                      raw_spin_lock(&tmp_mask_lock)
>>                      chip->irq_set_affinity
>>                      ...
>>                        its_vpe_set_affinity
>>
>> The tmp_mask_lock is the key. This is a global lock. I don't quite
>> understand
>> why tmp_mask_lock is needed here. I think there are two possible
>> solutions here:
>> 1. Remove this tmp_mask_lock
> 
> Maybe you could have a look at 33de0aa4bae98 (and 11ea68f553e24)? It
> would allow you to understand the nature of the problem.
> 
> This can probably be replaced with a per-CPU cpumask, which would
> avoid the locking, but potentially result in a larger memory usage.

Thanks, I will try it.

>> 2. Modify the gicv4 driver,do not perfrom VMOVP via
>> irq_set_affinity.
> 
> Sure. You could also not use KVM at all if don't care about interrupts
> being delivered to your VM. We do not send a VMOVP just for fun. We
> send it because your vcpu has moved to a different CPU, and the ITS
> needs to know about that.

When a vcpu is moved to a different CPU, of course VMOVP has to be sent.
I mean is it possible to call its_vpe_set_affinity() to send VMOVP by
other means (instead of by calling the irq_set_affinity() API). So we
can bypass this tmp_mask_lock.

> 
> You seem to be misunderstanding the use case for GICv4: a partitioned
> system, without any over-subscription, no vcpu migration between CPUs.
> If that's not your setup, then GICv4 will always be a net loss
> compared to SW injection with GICv3 (additional HW interaction,
> doorbell interrupts).

Thanks for the explanation. The key to the problem is not vcpu migration
between CPUs. The key point is that many vcpus execute vgic_v4_load() at
the same time. Even if it is not migrated to another CPU, there may be a
large number of vcpus executing vgic_v4_load() at the same time. For
example, the service running in VMs has a large number of MMIO accesses
and need to return to userspace for emulation. Due to the competition of
tmp_mask_lock, performance will deteriorate.

When the target CPU is the same CPU as the last run, there seems to be
no need to call irq_set_affinity() in this case. I did a test and it was
indeed able to alleviate the problem described above.

I feel it might be better to remove tmp_mask_lock or call
its_vpe_set_affinity() in another way. So I mentioned these two ideas
above.

Thanks,
Kunkun Jiang