linux-kernel - [PATCH 00/22] KVM: x86/mmu: Allow yielding on mmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240809194335.1726916-1-seanjc@google.com>
Date: Fri,  9 Aug 2024 12:43:12 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Oliver Upton <oliver.upton@...ux.dev>, Marc Zyngier <maz@...nel.org>, Peter Xu <peterx@...hat.com>, 
	James Houghton <jthoughton@...gle.com>
Subject: [PATCH 00/22] KVM: x86/mmu: Allow yielding on mmu_notifier zap

The main intent of this series is to allow yielding, i.e. cond_resched(),
when unmapping memory in shadow MMUs in response to an mmu_notifier
invalidation.  There is zero reason not to yield, and in fact I _thought_
KVM did yield, but because of how KVM grew over the years, the unmap path
got left behind.

The first half of the series is reworks max_guest_memory_test into
mmu_stress_test, to give some confidence in the mmu_notifier-related
changes.

Oliver and Marc, there's on patch lurking in here to enable said test on
arm64.  It's as well tested as I can make it (and that took much longer
than anticipated because arm64 hit races in the test that x86 doesn't
for whatever reason).

The middle of the series reworks x86's shadow MMU logic to use the
zap flow that can yield.

The last third or so is a wee bit adventurous, and is kinda of an RFC, but
well tested.  It's essentially prep/post work for James' MGLRU, and allows
aging SPTEs in x86's shadow MMU to run outside of mmu_lock, e.g. so that
nested TDP (stage-2) MMUs can participate in MGLRU.

If everything checks out, my goal is to land the selftests and yielding
changes in 6.12.  The aging stuff is incomplete and meaningless without
James' MGLRU, I'm posting it here purely so that folks can see the end
state when the mmu_notifier invalidation paths also moves to a different
API.

James, the aging stuff is quite well tested (see below).  Can you try
working into/on-top of your MGLRU series?  And if you're feeling very
kind, hammer it a bit more? :-)  I haven't looked at the latest ideas
and/or discussion on the MGLRU series, but I'm hoping that being able to
support the shadow MMU (absent the stupid eptad=0 case) in MGLRU will
allow for few shenanigans, e.g. no need to toggle flags during runtime.

As for testing, I spun up a VM and ran a compilation loop and `stress` in
the VM, while simultaneously running a small userspace program to age the
VM's memory (also in an infinite loop), using the same basic methodology as
access_tracking_perf_test.c (I put almost all of guest memory into a
memfd and then aged only that range of memory).

I confirmed that the locking does work, e.g. that there was (infrequent)
contention, and am fairly confident that the idea pans out.  E.g. I hit
the BUG_ON(!is_shadow_present_pte()) using that setup, which is the only
reason those patches exist :-)

Sean Christopherson (22):
  KVM: selftests: Check for a potential unhandled exception iff KVM_RUN
    succeeded
  KVM: selftests: Rename max_guest_memory_test to mmu_stress_test
  KVM: selftests: Only muck with SREGS on x86 in mmu_stress_test
  KVM: selftests: Compute number of extra pages needed in
    mmu_stress_test
  KVM: selftests: Enable mmu_stress_test on arm64
  KVM: selftests: Use vcpu_arch_put_guest() in mmu_stress_test
  KVM: selftests: Precisely limit the number of guest loops in
    mmu_stress_test
  KVM: selftests: Add a read-only mprotect() phase to mmu_stress_test
  KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)
  KVM: x86/mmu: Move walk_slot_rmaps() up near
    for_each_slot_rmap_range()
  KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
  KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
  KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is
    allowed
  KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific
    helper
  KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
  KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul'
    literals
  KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o
    mmu_lock
  KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded
    equivalent
  KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of
    mmu_lock
  KVM: x86/mmu: Add support for lockless walks of rmap SPTEs
  KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging
    gfns
  ***HACK*** KVM: x86: Don't take mmu_lock when aging gfns

 arch/x86/kvm/mmu/mmu.c                        | 527 +++++++++++-------
 arch/x86/kvm/svm/svm.c                        |   2 +
 arch/x86/kvm/vmx/vmx.c                        |   2 +
 tools/testing/selftests/kvm/Makefile          |   3 +-
 tools/testing/selftests/kvm/lib/kvm_util.c    |   3 +-
 ..._guest_memory_test.c => mmu_stress_test.c} | 144 ++++-
 virt/kvm/kvm_main.c                           |   7 +-
 7 files changed, 482 insertions(+), 206 deletions(-)
 rename tools/testing/selftests/kvm/{max_guest_memory_test.c => mmu_stress_test.c} (65%)


base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f
-- 
2.46.0.76.ge559c4bf1a-goog