[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250613202315.2790592-1-jthoughton@google.com>
Date: Fri, 13 Jun 2025 20:23:07 +0000
From: James Houghton <jthoughton@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>, Sean Christopherson <seanjc@...gle.com>
Cc: Vipin Sharma <vipinsh@...gle.com>, David Matlack <dmatlack@...gle.com>,
James Houghton <jthoughton@...gle.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: [PATCH v4 0/7] KVM: x86/mmu: Run TDP MMU NX huge page recovery under
MMU read lock
Hi Sean/Paolo,
I'm finishing off Vipin's NX huge page recovery optimization for the TDP
MMU from last year.
NX huge page recovery can cause guest performance jitter, originally
noticed with network tests in Windows guests. Please see Vipin's earlier
performance results[1]. Below is some new data I have collected with the
nx_huge_pages_perf_test that I've included with this series.
The NX huge page recovery for the shadow MMU is still done under the MMU
write lock, but with the TDP MMU, we can instead do it under the MMU
read lock by:
1. Tracking the possible NX huge pages for the two MMUs separately
(patch 1).
2. Updating the NX huge page recovery routine for the TDP MMU to
- zap SPTEs atomically, and
- grab tdp_mmu_pages_lock to iterate over the NX huge page list
(patch 3).
I threw in patch 4 because it seems harmless and closer to the "right"
thing to do. Feel free to drop it if you don't agree with me. :)
I'm also grabbing David's execute_perf_test[3] while I'm at it. It was
dropped before simply because it didn't apply at the time. David's test
works well as a stress test for NX huge page recovery when NX huge page
recovery is tuned to be very aggressive.
Changes since v3[2]:
- Dropped the move of the `sp->nx_huge_page_disallowed` check to outside
of the tdp_mmu_pages_lock.
- Implemented Sean's array suggestion for `possible_nx_huge_pages`.
- Implemented some other cleanup suggestions from Sean.
- Made shadow MMU not take the RCU lock in NX huge page recovery.
- Added a selftest for measuring jitter.
- Added David's execute_perf_test[3].
-- Results
$ cat /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms
100
$ cat /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio
4
$ ./nx_huge_pages_perf_test -b 16G -s anonymous_hugetlb_1gb
[Unpatched] Max fault latency: 8496724 cycles
[Unpatched] Max fault latency: 8404426 cycles
[ Patched ] Max fault latency: 49418 cycles
[ Patched ] Max fault latency: 51948 cycles
$ ./nx_huge_pages_perf_test -b 16G -s anonymous_hugetlb_2mb
[Unpatched] Max fault latency: 5320740 cycles
[Unpatched] Max fault latency: 5384554 cycles
[ Patched ] Max fault latency: 50052 cycles
[ Patched ] Max fault latency: 103774 cycles
$ ./nx_huge_pages_perf_test -b 16G -s anonymous_thp
[Unpatched] Max fault latency: 7625022 cycles
[Unpatched] Max fault latency: 6339934 cycles
[ Patched ] Max fault latency: 107976 cycles
[ Patched ] Max fault latency: 108386 cycles
$ ./nx_huge_pages_perf_test -b 16G -s anonymous
[Unpatched] Max fault latency: 143036 cycles
[Unpatched] Max fault latency: 287444 cycles
[ Patched ] Max fault latency: 274626 cycles
[ Patched ] Max fault latency: 303984 cycles
We can see about a 100x decrease in maximum fault latency for both
2M pages and 1G pages. This test is only timing writes to unmapped
pages that are not themselves currently undergoing NX huge page
recovery. The test only produces interesting results when NX huge page
recovery is actually occurring, so the parameters are tuned to make it
very likely for NX huge page recovery to occur in the middle of the
test.
Based on latest kvm/next.
[1]: https://lore.kernel.org/kvm/20240906204515.3276696-3-vipinsh@google.com/
[2]: https://lore.kernel.org/kvm/20240906204515.3276696-1-vipinsh@google.com/
[3]: https://lore.kernel.org/kvm/20221109185905.486172-2-dmatlack@google.com/
David Matlack (1):
KVM: selftests: Introduce a selftest to measure execution performance
James Houghton (3):
KVM: x86/mmu: Only grab RCU lock for nx hugepage recovery for TDP MMU
KVM: selftests: Provide extra mmap flags in vm_mem_add()
KVM: selftests: Add an NX huge pages jitter test
Vipin Sharma (3):
KVM: x86/mmu: Track TDP MMU NX huge pages separately
KVM: x86/mmu: Rename kvm_tdp_mmu_zap_sp() to better indicate its
purpose
KVM: x86/mmu: Recover TDP MMU NX huge pages using MMU read lock
arch/x86/include/asm/kvm_host.h | 39 ++-
arch/x86/kvm/mmu/mmu.c | 175 +++++++++-----
arch/x86/kvm/mmu/mmu_internal.h | 7 +-
arch/x86/kvm/mmu/tdp_mmu.c | 49 +++-
arch/x86/kvm/mmu/tdp_mmu.h | 3 +-
tools/testing/selftests/kvm/Makefile.kvm | 2 +
.../testing/selftests/kvm/execute_perf_test.c | 199 ++++++++++++++++
.../testing/selftests/kvm/include/kvm_util.h | 3 +-
.../testing/selftests/kvm/include/memstress.h | 4 +
tools/testing/selftests/kvm/lib/kvm_util.c | 15 +-
tools/testing/selftests/kvm/lib/memstress.c | 25 +-
.../kvm/x86/nx_huge_pages_perf_test.c | 223 ++++++++++++++++++
.../kvm/x86/private_mem_conversions_test.c | 2 +-
13 files changed, 646 insertions(+), 100 deletions(-)
create mode 100644 tools/testing/selftests/kvm/execute_perf_test.c
create mode 100644 tools/testing/selftests/kvm/x86/nx_huge_pages_perf_test.c
base-commit: 8046d29dde17002523f94d3e6e0ebe486ce52166
--
2.50.0.rc2.692.g299adb8693-goog
Powered by blists - more mailing lists