lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250204004038.1680123-1-jthoughton@google.com>
Date: Tue,  4 Feb 2025 00:40:27 +0000
From: James Houghton <jthoughton@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>
Cc: David Matlack <dmatlack@...gle.com>, David Rientjes <rientjes@...gle.com>, 
	James Houghton <jthoughton@...gle.com>, Marc Zyngier <maz@...nel.org>, 
	Oliver Upton <oliver.upton@...ux.dev>, Wei Xu <weixugc@...gle.com>, Yu Zhao <yuzhao@...gle.com>, 
	Axel Rasmussen <axelrasmussen@...gle.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: [PATCH v9 00/11] KVM: x86/mmu: Age sptes locklessly

By aging sptes locklessly with the TDP MMU and the shadow MMU, neither
vCPUs nor reclaim (mmu_notifier_invalidate_range*) will get stuck
waiting for aging. This contention reduction improves guest performance
and saves a significant amount of Google Cloud's CPU usage, and it has
valuable improvements for ChromeOS, as Yu has mentioned previously[1].

Please see v8[8] for some performance results using
access_tracking_perf_test patched to use MGLRU.

Neither access_tracking_perf_test nor mmu_stress_test trigger any
splats (with CONFIG_LOCKDEP=y) with the TDP MMU and with the shadow MMU.

=== Previous Versions ===

Since v8[8]:
 - Re-added the kvm_handle_hva_range helpers and applied Sean's
   kvm_{handle -> age}_hva_range rename.
 - Renamed spte_has_volatile_bits() to spte_needs_atomic_write() and
   removed its Accessed bit check. Undid change to
   tdp_mmu_spte_need_atomic_write().
 - Renamed KVM_MMU_NOTIFIER_{YOUNG -> AGING}_LOCKLESS.
 - cpu_relax(), lockdep, preempt_disable(), and locking fixups for
   per-rmap lock (thanks Lai and Sean).
 - Renamed kvm_{has -> may_have}_shadow_mmu_sptes().
 - Rebased onto latest kvm/next, including changing
   for_each_tdp_mmu_root_rcu to use `types`.
 - Dropped MGLRU changes from access_tracking_perf_test.
 - Picked up Acked-bys from Yu. (thank you!)

Since v7[7]:
 - Dropped MGLRU changes.
 - Dropped DAMON cleanup.
 - Dropped MMU notifier changes completely.
 - Made shadow MMU aging *always* lockless, not just lockless when the
   now-removed "fast_only" clear notifier was used.
 - Given that the MGLRU changes no longer introduce a new MGLRU
   capability, drop the new capability check from the selftest.
 - Rebased on top of latest kvm-x86/next, including the x86 mmu changes
   for marking pages as dirty.

Since v6[6]:
 - Rebased on top of kvm-x86/next and Sean's lockless rmap walking
   changes.
 - Removed HAVE_KVM_MMU_NOTIFIER_YOUNG_FAST_ONLY (thanks DavidM).
 - Split up kvm_age_gfn() / kvm_test_age_gfn() optimizations (thanks
   DavidM and Sean).
 - Improved new MMU notifier documentation (thanks DavidH).
 - Dropped arm64 locking change.
 - No longer retry for CAS failure in TDP MMU non-A/D case (thanks
   Sean).
 - Added some R-bys and A-bys.

Since v5[5]:
 - Reworked test_clear_young_fast_only() into a new parameter for the
   existing notifiers (thanks Sean).
 - Added mmu_notifier.has_fast_aging to tell mm if calling fast-only
   notifiers should be done.
 - Added mm_has_fast_young_notifiers() to inform users if calling
   fast-only notifier helpers is worthwhile (for look-around to use).
 - Changed MGLRU to invoke a single notifier instead of two when
   aging and doing look-around (thanks Yu).
 - For KVM/x86, check indirect_shadow_pages > 0 instead of
   kvm_memslots_have_rmaps() when collecting age information
   (thanks Sean).
 - For KVM/arm, some fixes from Oliver.
 - Small fixes to access_tracking_perf_test.
 - Added missing !MMU_NOTIFIER version of mmu_notifier_clear_young().

Since v4[4]:
 - Removed Kconfig that controlled when aging was enabled. Aging will
   be done whenever the architecture supports it (thanks Yu).
 - Added a new MMU notifier, test_clear_young_fast_only(), specifically
   for MGLRU to use.
 - Add kvm_fast_{test_,}age_gfn, implemented by x86.
 - Fix locking for clear_flush_young().
 - Added KVM_MMU_NOTIFIER_YOUNG_LOCKLESS to clean up locking changes
   (thanks Sean).
 - Fix WARN_ON and other cleanup for the arm64 locking changes
   (thanks Oliver).

Since v3[3]:
 - Vastly simplified the series (thanks David). Removed mmu notifier
   batching logic entirely.
 - Cleaned up how locking is done for mmu_notifier_test/clear_young
   (thanks David).
 - Look-around is now only done when there are no secondary MMUs
   subscribed to MMU notifiers.
 - CONFIG_LRU_GEN_WALKS_SECONDARY_MMU has been added.
 - Fixed the lockless implementation of kvm_{test,}age_gfn for x86
   (thanks David).
 - Added MGLRU functional and performance tests to
   access_tracking_perf_test (thanks Axel).
 - In v3, an mm would be completely ignored (for aging) if there was a
   secondary MMU but support for secondary MMU walking was missing. Now,
   missing secondary MMU walking support simply skips the notifier
   calls (except for eviction).
 - Added a sanity check for that range->lockless and range->on_lock are
   never both provided for the memslot walk.

For the changes since v2[2], see v3.

Based on latest kvm/next.

[1]: https://lore.kernel.org/kvm/CAOUHufYS0XyLEf_V+q5SCW54Zy2aW5nL8CnSWreM8d1rX5NKYg@mail.gmail.com/
[2]: https://lore.kernel.org/kvmarm/20230526234435.662652-1-yuzhao@google.com/
[3]: https://lore.kernel.org/linux-mm/20240401232946.1837665-1-jthoughton@google.com/
[4]: https://lore.kernel.org/linux-mm/20240529180510.2295118-1-jthoughton@google.com/
[5]: https://lore.kernel.org/linux-mm/20240611002145.2078921-1-jthoughton@google.com/
[6]: https://lore.kernel.org/linux-mm/20240724011037.3671523-1-jthoughton@google.com/
[7]: https://lore.kernel.org/kvm/20240926013506.860253-1-jthoughton@google.com/
[8]: https://lore.kernel.org/kvm/20241105184333.2305744-1-jthoughton@google.com/

James Houghton (7):
  KVM: Rename kvm_handle_hva_range()
  KVM: Add lockless memslot walk to KVM
  KVM: x86/mmu: Factor out spte atomic bit clearing routine
  KVM: x86/mmu: Relax locking for kvm_test_age_gfn() and kvm_age_gfn()
  KVM: x86/mmu: Rename spte_has_volatile_bits() to
    spte_needs_atomic_write()
  KVM: x86/mmu: Skip shadow MMU test_young if TDP MMU reports page as
    young
  KVM: x86/mmu: Only check gfn age in shadow MMU if
    indirect_shadow_pages > 0

Sean Christopherson (4):
  KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o
    mmu_lock
  KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of
    mmu_lock
  KVM: x86/mmu: Add support for lockless walks of rmap SPTEs
  KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging
    gfns

 Documentation/virt/kvm/locking.rst |   4 +-
 arch/x86/include/asm/kvm_host.h    |   4 +-
 arch/x86/kvm/Kconfig               |   1 +
 arch/x86/kvm/mmu/mmu.c             | 364 +++++++++++++++++++++--------
 arch/x86/kvm/mmu/spte.c            |  19 +-
 arch/x86/kvm/mmu/spte.h            |   2 +-
 arch/x86/kvm/mmu/tdp_iter.h        |  26 ++-
 arch/x86/kvm/mmu/tdp_mmu.c         |  36 ++-
 include/linux/kvm_host.h           |   1 +
 virt/kvm/Kconfig                   |   2 +
 virt/kvm/kvm_main.c                |  56 +++--
 11 files changed, 364 insertions(+), 151 deletions(-)


base-commit: f7bafceba76e9ab475b413578c1757ee18c3e44b
-- 
2.48.1.362.g079036d154-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ