[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250318180303.283401-3-seanjc@google.com>
Date: Tue, 18 Mar 2025 11:02:57 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Sean Christopherson <seanjc@...gle.com>
Subject: [GIT PULL] KVM: x86: MMU changes for 6.15
Except for a minor cleanup, the MMU changes for 6.15 are all about adding
support for aging SPTEs without holding mmu_lock. Details in the tag.
The following changes since commit a64dcfb451e254085a7daee5fe51bf22959d52d3:
Linux 6.14-rc2 (2025-02-09 12:45:03 -0800)
are available in the Git repository at:
https://github.com/kvm-x86/linux.git tags/kvm-x86-mmu-6.15
for you to fetch changes up to 0dab791f05ce2c9f0215f50cb46ed0c3126fe211:
KVM: x86/tdp_mmu: Remove tdp_mmu_for_each_pte() (2025-02-28 09:14:20 -0800)
----------------------------------------------------------------
KVM x86/mmu changes for 6.15
Add support for "fast" aging of SPTEs in both the TDP MMU and Shadow MMU, where
"fast" means "without holding mmu_lock". Not taking mmu_lock allows multiple
aging actions to run in parallel, and more importantly avoids stalling vCPUs,
e.g. due to holding mmu_lock for an extended duration while a vCPU is faulting
in memory.
For the TDP MMU, protect aging via RCU; the page tables are RCU-protected and
KVM doesn't need to access any metadata to age SPTEs.
For the Shadow MMU, use bit 1 of rmap pointers (bit 0 is used to terminate a
list of rmaps) to implement a per-rmap single-bit spinlock. When aging a gfn,
acquire the rmap's spinlock with read-only permissions, which allows hardening
and optimizing the locking and aging, e.g. locking an rmap for write requires
mmu_lock to also be held. The lock is NOT a true R/W spinlock, i.e. multiple
concurrent readers aren't supported.
To avoid forcing all SPTE updates to use atomic operations (clearing the
Accessed bit out of mmu_lock makes it inherently volatile), rework and rename
spte_has_volatile_bits() to spte_needs_atomic_update() and deliberately exclude
the Accessed bit. KVM (and mm/) already tolerates false positives/negatives
for Accessed information, and all testing has shown that reducing the latency
of aging is far more beneficial to overall system performance than providing
"perfect" young/old information.
----------------------------------------------------------------
James Houghton (6):
KVM: Rename kvm_handle_hva_range()
KVM: Allow lockless walk of SPTEs when handing aging mmu_notifier event
KVM: x86/mmu: Factor out spte atomic bit clearing routine
KVM: x86/mmu: Don't force atomic update if only the Accessed bit is volatile
KVM: x86/mmu: Skip shadow MMU test_young if TDP MMU reports page as young
KVM: x86/mmu: Only check gfn age in shadow MMU if indirect_shadow_pages > 0
Nikolay Borisov (1):
KVM: x86/tdp_mmu: Remove tdp_mmu_for_each_pte()
Sean Christopherson (6):
KVM: x86/mmu: Always update A/D-disabled SPTEs atomically
KVM: x86/mmu: Age TDP MMU SPTEs without holding mmu_lock
KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o mmu_lock
KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock
KVM: x86/mmu: Add support for lockless walks of rmap SPTEs
KVM: x86/mmu: Walk rmaps (shadow MMU) without holding mmu_lock when aging gfns
Documentation/virt/kvm/locking.rst | 4 +-
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/mmu/mmu.c | 363 +++++++++++++++++++++++++++----------
arch/x86/kvm/mmu/spte.c | 31 ++--
arch/x86/kvm/mmu/spte.h | 2 +-
arch/x86/kvm/mmu/tdp_iter.h | 34 ++--
arch/x86/kvm/mmu/tdp_mmu.c | 45 +++--
include/linux/kvm_host.h | 1 +
virt/kvm/Kconfig | 4 +
virt/kvm/kvm_main.c | 53 +++---
11 files changed, 373 insertions(+), 169 deletions(-)
Powered by blists - more mailing lists