[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z75y90KM_fE6H1cJ@google.com>
Date: Tue, 25 Feb 2025 17:48:39 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: pbonzini@...hat.com, rick.p.edgecombe@...el.com,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH] KVM: selftests: Wait mprotect_ro_done before write to RO
in mmu_stress_test
On Sat, Feb 08, 2025, Yan Zhao wrote:
> In the read-only mprotect() phase of mmu_stress_test, ensure that
> mprotect(PROT_READ) has completed before the guest starts writing to the
> read-only mprotect() memory.
>
> Without waiting for mprotect_ro_done before the guest starts writing in
> stage 3 (the stage for read-only mprotect()), the host's assertion of stage
> 3 could fail if mprotect_ro_done is set to true in the window between the
> guest finishing writes to all GPAs and executing GUEST_SYNC(3).
>
> This scenario is easy to occur especially when there are hundred of vCPUs.
>
> CPU 0 CPU 1 guest CPU 1 host
> enter stage 3's 1st loop
> //in stage 3
> write all GPAs
> @rip 0x4025f0
>
> mprotect(PROT_READ)
> mprotect_ro_done=true
> GUEST_SYNC(3)
> r=0, continue stage 3's 1st loop
>
> //in stage 4
> write GPA
> @rip 0x402635
>
> -EFAULT, jump out stage 3's 1st loop
> enter stage 3's 2nd loop
> write GPA
> @rip 0x402635
> -EFAULT, continue stage 3's 2nd loop
> guest rip += 3
>
> The test then fails and reports "Unhandled exception '0xe' at guest RIP
> '0x402638'", since the next valid guest rip address is 0x402639, i.e. the
> "(mem) = val" in vcpu_arch_put_guest() is compiled into a mov instruction
> of length 4.
This shouldn't happen. On x86, stage 3 is a hand-coded "mov %rax, (%rax)", not
vcpu_arch_put_guest(). Either something else is going on, or __x86_64__ isn't
defined?
do {
for (gpa = start_gpa; gpa < end_gpa; gpa += stride)
#ifdef __x86_64__
asm volatile(".byte 0x48,0x89,0x00" :: "a"(gpa) : "memory"); /* mov %rax, (%rax) */
#elif defined(__aarch64__)
asm volatile("str %0, [%0]" :: "r" (gpa) : "memory");
#else
vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa);
#endif
} while (!READ_ONCE(mprotect_ro_done));
/*
* Only architectures that write the entire range can explicitly sync,
* as other architectures will be stuck on the write fault.
*/
#if defined(__x86_64__) || defined(__aarch64__)
GUEST_SYNC(3);
#endif
for (gpa = start_gpa; gpa < end_gpa; gpa += stride)
vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa);
GUEST_SYNC(4);
> Even if it could be compiled into a mov instruction of length 3, the
> following execution of GUEST_SYNC(4) in guest could still cause the host
> failure of the assertion of stage 3.
Sorry, I don't follow. The guest doesn't get "released" from GUEST_SYNC(3) until
the host runs the vCPU again, and that happens after asserting on stage 3 and
synchronizing with the main thread.
assert_sync_stage(vcpu, 3);
#endif /* __x86_64__ || __aarch64__ */
rendezvous_with_boss();
/*
* Stage 4. Run to completion, waiting for mprotect(PROT_WRITE) to
* make the memory writable again.
*/
do {
r = _vcpu_run(vcpu);
} while (r && errno == EFAULT);
Powered by blists - more mailing lists