[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250208105318.16861-1-yan.y.zhao@intel.com>
Date: Sat, 8 Feb 2025 18:53:18 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: pbonzini@...hat.com,
seanjc@...gle.com
Cc: rick.p.edgecombe@...el.com,
linux-kernel@...r.kernel.org,
kvm@...r.kernel.org,
Yan Zhao <yan.y.zhao@...el.com>
Subject: [PATCH] KVM: selftests: Wait mprotect_ro_done before write to RO in mmu_stress_test
In the read-only mprotect() phase of mmu_stress_test, ensure that
mprotect(PROT_READ) has completed before the guest starts writing to the
read-only mprotect() memory.
Without waiting for mprotect_ro_done before the guest starts writing in
stage 3 (the stage for read-only mprotect()), the host's assertion of stage
3 could fail if mprotect_ro_done is set to true in the window between the
guest finishing writes to all GPAs and executing GUEST_SYNC(3).
This scenario is easy to occur especially when there are hundred of vCPUs.
CPU 0 CPU 1 guest CPU 1 host
enter stage 3's 1st loop
//in stage 3
write all GPAs
@rip 0x4025f0
mprotect(PROT_READ)
mprotect_ro_done=true
GUEST_SYNC(3)
r=0, continue stage 3's 1st loop
//in stage 4
write GPA
@rip 0x402635
-EFAULT, jump out stage 3's 1st loop
enter stage 3's 2nd loop
write GPA
@rip 0x402635
-EFAULT, continue stage 3's 2nd loop
guest rip += 3
The test then fails and reports "Unhandled exception '0xe' at guest RIP
'0x402638'", since the next valid guest rip address is 0x402639, i.e. the
"(mem) = val" in vcpu_arch_put_guest() is compiled into a mov instruction
of length 4.
Even if it could be compiled into a mov instruction of length 3, the
following execution of GUEST_SYNC(4) in guest could still cause the host
failure of the assertion of stage 3.
Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
---
tools/testing/selftests/kvm/mmu_stress_test.c | 25 +++++++++++--------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/kvm/mmu_stress_test.c b/tools/testing/selftests/kvm/mmu_stress_test.c
index d9c76b4c0d88..a2ccf705bb2a 100644
--- a/tools/testing/selftests/kvm/mmu_stress_test.c
+++ b/tools/testing/selftests/kvm/mmu_stress_test.c
@@ -35,11 +35,16 @@ static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride)
GUEST_SYNC(2);
/*
- * Write to the region while mprotect(PROT_READ) is underway. Keep
- * looping until the memory is guaranteed to be read-only, otherwise
- * vCPUs may complete their writes and advance to the next stage
- * prematurely.
- *
+ * mprotect(PROT_READ) is underway. Keep looping until the memory is
+ * guaranteed to be read-only, otherwise vCPUs may complete their
+ * writes and advance to the next stage prematurely.
+ */
+ do {
+ ;
+ } while (!READ_ONCE(mprotect_ro_done));
+
+ /*
+ * Write to the region after mprotect(PROT_READ) is done.
* For architectures that support skipping the faulting instruction,
* generate the store via inline assembly to ensure the exact length
* of the instruction is known and stable (vcpu_arch_put_guest() on
@@ -47,16 +52,14 @@ static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride)
* is low in this case). For x86, hand-code the exact opcode so that
* there is no room for variability in the generated instruction.
*/
- do {
- for (gpa = start_gpa; gpa < end_gpa; gpa += stride)
+ for (gpa = start_gpa; gpa < end_gpa; gpa += stride)
#ifdef __x86_64__
- asm volatile(".byte 0x48,0x89,0x00" :: "a"(gpa) : "memory"); /* mov %rax, (%rax) */
+ asm volatile(".byte 0x48,0x89,0x00" :: "a"(gpa) : "memory"); /* mov %rax, (%rax) */
#elif defined(__aarch64__)
- asm volatile("str %0, [%0]" :: "r" (gpa) : "memory");
+ asm volatile("str %0, [%0]" :: "r" (gpa) : "memory");
#else
- vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa);
+ vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa);
#endif
- } while (!READ_ONCE(mprotect_ro_done));
/*
* Only architectures that write the entire range can explicitly sync,
--
2.43.2
Powered by blists - more mailing lists