[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJHc60wBNTP9SSt_skEXXv9N+tF_1RoV6vcQQx4hWphJF6EmkQ@mail.gmail.com>
Date: Thu, 7 Aug 2025 11:58:01 -0700
From: Raghavendra Rao Ananta <rananta@...gle.com>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: Marc Zyngier <maz@...nel.org>, Mingwei Zhang <mizhang@...gle.com>,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH 2/2] KVM: arm64: Destroy the stage-2 page-table periodically
Hi Oliver,
>
> Protected mode is affected by the same problem, potentially even worse
> due to the overheads of calling into EL2. Both protected and
> non-protected flows should use stage2_destroy_range().
>
I experimented with this (see diff below), and it looks like it takes
significantly longer to finish the destruction even for a very small
VM. For instance, it takes ~140 seconds on an Ampere Altra machine.
This is probably because we run cond_resched() for every breakup in
the entire sweep of the possible address range, 0 to ~(0ULL), even
though there are no actual mappings there, and we context switch out
more often.
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
+ static void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
+ {
+ u64 end = is_protected_kvm_enabled() ? ~(0ULL) : BIT(pgt->ia_bits);
+ u64 next, addr = 0;
+
+ do {
+ next = stage2_range_addr_end(addr, end);
+ KVM_PGT_FN(kvm_pgtable_stage2_destroy_range)(pgt, addr,
+ next - addr);
+
+ if (next != end)
+ cond_resched();
+ } while (addr = next, addr != end);
+
+
+ KVM_PGT_FN(kvm_pgtable_stage2_destroy_pgd)(pgt);
+ }
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -316,9 +316,13 @@ static int __pkvm_pgtable_stage2_unmap(struct
kvm_pgtable *pgt, u64 start, u64 e
return 0;
}
-void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
+void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt, u64
addr, u64 size)
+{
+ __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+}
+
+void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
+{
+}
Without cond_resched() in place, it takes about half the time.
I also tried moving cond_resched() to __pkvm_pgtable_stage2_unmap(),
as per the below diff, and calling pkvm_pgtable_stage2_destroy_range()
for the entire 0 to ~(1ULL) range (instead of breaking it up). Even
for a fully 4K mapped 128G VM, I see it taking ~65 seconds, which is
close to the baseline (no cond_resched()).
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -311,8 +311,11 @@ static int __pkvm_pgtable_stage2_unmap(struct
kvm_pgtable *pgt, u64 start, u64 e
return ret;
pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
kfree(mapping);
+ cond_resched();
}
Does it make sense to call cond_resched() only when we actually unmap pages?
Thank you.
Raghavendra
Powered by blists - more mailing lists