[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250418102244.2182975-1-ljykernel@163.com>
Date: Fri, 18 Apr 2025 18:22:43 +0800
From: Jiayuan Liang <ljykernel@....com>
To: Marc Zyngier <maz@...nel.org>,
Oliver Upton <oliver.upton@...ux.dev>
Cc: Joey Gouly <joey.gouly@....com>,
Suzuki K Poulose <suzuki.poulose@....com>,
Zenghui Yu <yuzenghui@...wei.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
linux-arm-kernel@...ts.infradead.org,
kvmarm@...ts.linux.dev,
linux-kernel@...r.kernel.org,
Jiayuan Liang <ljykernel@....com>
Subject: [RFC PATCH 0/1] KVM-arm: Optimize cache flush by only flushing on vcpu0
This is an RFC patch to optimize cache flushing behavior in KVM/arm64.
When toggling cache state in a multi-vCPU guest, we currently flush the VM's
stage2 page tables on every vCPU that transitions cache state. This leads to
redundant cache flushes during guest boot, as each vCPU performs the same
flush operation.
In a typical guest boot sequence, vcpu0 is the first to enable caches, and
other vCPUs follow afterward. By the time secondary vCPUs enable their caches,
the flush performed by vcpu0 has already ensured cache coherency for the
entire VM.
I'm proposing to optimize this by only performing the stage2_flush_vm() operation
on vcpu0, which is sufficient to maintain cache coherency while eliminating redundant
flushes on other vCPUs. This can improve performance during guest boot in
multi-vCPU configurations.
I'm submitting this as RFC because:
1. This is my first contribution to the KVM/arm64 subsystem
2. I want to confirm if this approach is architecturally sound
3. I'd like feedback on potential corner cases I may have missed:
- Could there be scenarios where secondary vCPUs need their own flushes?
- Is the assumption about vcpu0 always being first valid?
Implementation details:
- The patch identifies vcpu0 by checking if vcpu->vcpu_id == 0
Testing with a 64-core VM with 128GB memory using hugepages shows dramatic
performance improvements, reducing busybox boot time from 33s to 5s.
I'd appreciate any feedback on the correctness and approach of this optimization.
Jiayuan Liang (1):
KVM: arm: Optimize cache flush by only flushing on vcpu0
arch/arm64/kvm/mmu.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
base-commit: fc96b232f8e7c0a6c282f47726b2ff6a5fb341d2
--
2.43.0
Powered by blists - more mailing lists